I’ve spent a lot of the last few months working on some research that mostly amounts to running lots of black-box type computer code. I spent weeks just getting the stuff to compile, then weeks getting all the data files positioned in the right places so the programs could find them. This stuff is all home-brewed, hacked together a lot of the time without comments and in unintuitive structures. The priority was being able to put data in one end and get something out the other, not user-friendliness. The only documentation that exists is the brain of the post-doc and her supervisor in the offices down the hall.

So, when we try to improve one of these programs, you can count on a lot of debugging before anything gets fixed. Often the things just won’t compile, or there’s a typo so it looks for the wrong input file. Recently one of these programs was rewritten, and was tested a little to see that it actually worked. On a small scale it worked fine, but when we started submitting jobs with it on our big computing cluster, 100 gigabytes of data would go in and exactly 0 bytes would come out. Segmentation fault.

It was not the size of the data we were putting through that caused the segmentation fault, but the number of processes we were running. It turned out the segmentation fault was not repeatable when running one copy fo the program at once, but when you run dozens of copies simultaneously, when one fails they all die.

The only way I could really deal with this was run the program in a debugger and just hope to catch the fault. For about a week I had a terminal window open with gdb running my program. Every few hours I would check to see that it succeeded in its job and then would run it again, hoping for that elusive seg fault. Days went by. Program exited successfully. Program exited successfully. Program exited successfully. My supervisor would ask if I had solved the program yet. I would look at him and say, “Program exited successfully.”

Until finally I decided to give up. I told my supervisor, “If this run doesn’t die, then I’m giving up and going back to the old code.” Within two minutes of that conversation…. segmentation fault! Like the rarest of pokemon in an Ultra Ball, my debugger had finally succeeded in capturing the bug in my program.

And that, children, is why playing video games will help you do well in school. All the time my boyfriend made me play pokemon with him back in senior year of undergrad was well spent after all.

Random FAQ Comments (0)

Leave a Reply