Tracing Memory Issues

I wasted two weeks of my life, hunting for a bug that is corrupting the heap of a server application. The application runs on Unix-like OSs and is fairly complex. It’s multithreaded and consists of a front-end and several back-ends. The server is written in C++ and it has in-house developed garbage collector. It exposes CORBA interface and several clients, developed in different languages (Java, C#, Perl), connect to it.

Two weeks ago I made a small change and ever since that, I’m getting “Segmentation fault” error. It’s really frustrating, because the error is hard to reproduce – the server works for several hours with serious load, before the error occurs.

Plan A

As a first step I built the application with debugging information for use by GDB (-ggdb) and disabled optimizations (-O0)  and then I started debugging it. The error proved to be a moving target, because it occurred at a different place every time. Of course tracing memory issues is not new for me (as for every C++ developer, I guess), so I did not fell into despair.
It was time for Plan B.

Plan B

My plan B on POSIX-compatible OSs always has been Valgrind. Valgrind is a great memory debugger. If you have not used it, you should definitely give it a try. Valgrind has different tools, but I’ve used only memcheck so far.
Memcheck can detect various issues, like:

  • Memory leaks
  • Use of uninitialized memory
  • Reads/writes outside of an allocated block
  • Mismatched free() / delete / delete []

It’s very useful, but it did not find errors this time. Maybe it was time to fall into despair?

Plan C

I did not have Plan C beforehand, but while I was googling for a solution I stumbled upon efence – Electric Fence Malloc Debugger. It “helps you detect two common programming bugs: software that overruns the boundaries of a malloc() memory allocation, and software that touches a memory allocation that has been released by free()”. All you have to do is link your program with the library libefence.a (-lefence).
I did give it a try and thanks to efence I finally found an issue:

  • I was using a stl::map from several threads without protection and as we know STL collections aren’t threadsafe.

I fixed it by using a mutex to lock the critical section, but the seg faults did not stop. Obviously there are other issues with my code.
I still rely on efence to detect the issue(s), but it’s consuming a lot of memory and I have to reboot my application pretty often.

Plan D?

So, while the tracing of the error continues:

  • I wonder is there a “Plan D”?
  • Am I missing something?
  • Have you been in a situation like mine and how did you solve it?
You can leave a response, or trackback from your own site.

One response to “Tracing Memory Issues”

  1. asdasdasad says:

    Think the best empirical way to find the error is to split the application in small subapps and test one by one with largest possible data that the server will face.

    Divide and Conquer.

Leave a Reply