One of our programs was leaking memory. Not much, but enough that Tech Ops were not going to allow us to put it into production. Fair enough, I wouldn’t allow it either, if I were on-call.
So I did the obvious: started looking for the leak. This is not as easy as I’d like.
First I tried Test::LeakTrace, which gives lots of information, but:
- It gives too much information
- It slows things down unbearably
For an example of the slowness, a test that usually runs in less than one minute, took about a week when run with Test::LeakTrace. Since I planned to run several tests multiple times, it was clearly not a viable option.
Second thing I tried: look at /proc/self/stat to see how much memory the process is using. The plan of attack was:
- Run some test code 10 times
- Measure memory
- Run some test code 20 times
- Measure memory
- Etc…
This did not work: I was expecting to see a linear increase of used memory, but in fact I saw random numbers. Perl‘s allocator is clever, and the kernel’s allocator is clever, and I’m not clever enough to figure out what they’re doing.
So I started looking at perlguts, perldebguts, perlhacktips, and other scary documentation files. They talk about “SV allocation logging”, “memory profiling”, and so on. But, getting those requires re-compiling a Perl. Was I brave enough?
Well, normally I wouldn’t be, but PerlBrew makes compiling a Perl almost easy. I’ll save you the three failed attempts (I found the configuration switches difficult to understand), and show a compressed version of the script I ended up using:
#!/bin/bash perlbrew switch perl-5.14.2 perlbrew uninstall debug-perl perlbrew install perl-5.14.2 -n -j5 --as debug-perl -DDEBUGGING -DPERL_MEM_LOG -DDEBUG_LEAKING_SCALARS -DPERL_MEM_LOG -Dusedebugging -Dusemymalloc perlbrew switch debug-perl perlbrew install-cpanm cpanm -n <<EOF Acme::MetaSyntactic Alien::ActiveMQ App::Ack … parent true version EOF cd /tmp rm -rf Data-Rx* tar zxvf ~/src/CPAN/Data-Rx-0.007.tar.gz cd Data-Rx* patch -p1 < ~/src/CPAN_distroprefs/Data-Rx-0.007.patch perl Makefile.PL make install cd ~/src/catalyst-engine-stomp/ perl Makefile.PL make install cd ~/src/Data-MultiValued/ dzil install # etc etc, for our in-house modules cd
This allowed me to have a working Perl with all the dependencies I needed. Still, things like PERL_MEM_LOG were not working, and the values returned by Devel::Peek were not exactly clear to me.
Asking on #london.pm revealed that the memory logging facilities were removed from Perl a long time ago, and that nobody knows how to properly read the values from Devel::Peek. So I took some guesses, and wrote this program:
#!/usr/bin/env perl use strict; use warnings; use Devel::Peek; use MyTest; { # pre-alloc some memory my %report;my @diffs=(100)x100; sub measure { my (%args) = @_; my $code = $args{code} // sub {}; my $cleanup = $args{cleanup} // sub {}; my $loops = $args{loops} // [1]; $code->(); mstats_fillhash(%report); $diffs[0]=$report{total}-$report{totfree}; keys @$loops; while (my ($i,$count) = each @$loops) { say "$i: looping $count times"; $code->() for 1..$count; $cleanup->(); mstats_fillhash(%report); $diffs[$i+1]=$report{total}-$report{totfree}; say " diff: ",$diffs[$i+1]-$diffs[$i]; say ''; } for my $i (1..@$loops) { printf "% 3d (% 5d times): % 10d % 10.1fn", $i,$loops->[$i-1], $diffs[$i]-$diffs[$i-1], ($diffs[$i]-$diffs[$i-1])/$loops->[$i-1]; } } } measure code => sub { MyTest->test_it, }, loops => [ 10, 20, 30, 40 ];
This, finally, got me a roughly linear increase in memory usage. Then, it was a matter of bisecting the code paths inside the test, checking which changes made the diffs go to 0.
In the end, it was Benchmark::Timer that was allocating memory. Yes, I know, it’s designed to work that way, and I have no-one to blame but myself for using a library without reading all its code.
Anyway, I’ve removed Benchmark::Timer from the code, I wasn’t using its results anyway, and now the program can go to production. It only took me a week…