A Technique for Finding and Fixing Memory Leaks in Ruby C Extensions

7/26/2011 - Tim Elliott

Here is a description of how I found and fixed a memory leak in the Ruby Nokogiri Library. This guide should help in finding and fixing similar issues with other Ruby extensions.

I was inspired to write this guide while working on a Ruby C extension of my own. I came across a good way to discover memory leaks. This method requires that the library in question has an exhaustive test suite and that the project uses MiniTest, though it should be possible to tweak this method to work with RSpec and Test::Unit as well.

Looping Unit Tests

The trick is to loop a set of unit tests and to watch memory consumption along the way. Many projects have exhaustive test suites, but the developers only run the tests once to make sure the assertions pass.

In order to loop a test written with MiniTest, load the test script with "require" and then trigger MiniTest in a loop:

require "test_file.rb"
loop{ MiniTest::Unit.new.run }

The above will cause the test to loop indefinitely without reloading the interpreter in between. When executing this, make sure to include the root library directory and the test directory in the load path:

$ ruby -Ilib -Itest -e "require 'test_file.rb'; loop{ MiniTest::Unit.new.run }"

Watching Memory Consumption

Once this is running, switch to another terminal and take a look at the test process:

$ watch "ps aux | grep Itest"

Every 2.0s: ps aux | grep Itest

tim      30386  0.0  0.0   4100   616 pts/1    S+   11:30   0:00 sh -c ruby -Ilib -Itest -e "require 'test_file.rb'; loop{ MiniTest::Unit.new.run }"
tim      30388  100  0.3  46212 12832 pts/1    Rl+  11:30   2:40 ruby -Ilib -Itest -e require 'test_file.rb'; loop{ MiniTest::Unit.new.run }
tim      30520  0.5  0.0  10112  1388 pts/0    S+   11:33   0:00 watch ps aux | grep Itest
...

Some tests will grow for a little while and eventually settle, but once memory use has been increasing for over 30 seconds, you may be on to a memory leak.

Automate or Die

In order to make this practical, add the following tasks to your Rakefile:

desc 'Run a test in looped mode so that you can look for memory leaks.'
task 'test_loop' do
  code = %Q[require '#{$*[1]}'; loop{ MiniTest::Unit.new.run }]
  cmd = %Q[ruby -Ilib -Itest -e "#{ code }"]
  system cmd
end

desc 'Watch Memory use of a looping test'
task 'test_loop_mem' do
  system 'watch "ps aux | grep Itest"'
end

You can now invoke and watch a looping set of tests with the following commands:

$ rake test_loop test_file.rb
$ rake test_loop_mem

Applying this Technique to Nokogiri

Armed with this new memory leak hunting tool, I got the latest source of Nokogiri. Nokogiri makes a great reference for any Ruby C extension writer, and it has an exhaustive test suite to boot.

I quickly found that test_xslt_transforms.rb grows in memory usage pretty consistently. After invoking the test loop in one terminal, the process watcher showed increasing memory use:

Every 2.0s: ps aux | grep Itest

tim      31805  0.0  0.0   4096   608 pts/0    S+   11:52   0:00 sh -c ruby -Ilib -Itest -e "require 'test_xslt_transforms.rb'; loop{ MiniTest::Unit.new.run }"
tim      31807 99.6  1.1  97028 47332 pts/0    Rl+  11:52   0:48 ruby -Ilib -Itest -e require 'test_xslt_transforms.rb'; loop{ MiniTest::Unit.new.run }
tim      31830  0.0  0.0   4096   612 pts/1    S+   11:52   0:00 sh -c watch "ps aux | grep Itest"

In order to locate the offending unit test I comment out half of the running tests and re-run the loop. In this fashion, I am able to narrow the issue down to the following test in test_xslt_transforms.rb:

def test_xslt_parse_error
  xslt_str = <<-EOX
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <!-- Not well-formed: -->
  <xsl:template match="/"/>
    <values>
      <xsl:for-each select="//*">
        <value>
          <xsl:value-of select="@id"/>
        </value>
      </xsl:for-each>
    </values>
  </xsl:template>
</xsl:stylesheet>}
  EOX
  assert_raises(RuntimeError) { Nokogiri::XSLT.parse(xslt_str) }
end

So, when a call to XSLT.parse results in a RuntimeError, there is a possibility of a memory leak. test_xslt_transforms.rb has some tests that don't cause an exception and they don't leak memory, so it seems that this leak is specific to exceptions.

Nokogiri Under the Hood

XSLT.parse is defined in lib/nokogiri/xslt.rb. It parses the given argument as an XML document and hands it to Stylesheet.parse_stylesheet_doc:

def parse string, modules = {}
  modules.each do |url, klass|
    XSLT.register url, klass
  end

  Stylesheet.parse_stylesheet_doc(XML.parse(string))
end

Assuming that XML.parse is not leaking memory, lets take a closer look at Stylesheet.parse_stylesheet_doc, which is defined in ext/nokogiri/xslt_stylesheet.c. The method does the following:

Give libxslt a callback for error handling (xsltSetGenericErrorFunc).
Make a copy of the xml document (xmlCopyDoc).
Pass the copy to xsltParseStylesheetDoc.
Unset the callback.
Return a Stylesheet object which wraps the native libxslt stylesheet struct with a deallocator.

static VALUE parse_stylesheet_doc(VALUE klass, VALUE xmldocobj)
{
    xmlDocPtr xml ;
    xsltStylesheetPtr ss ;
    Data_Get_Struct(xmldocobj, xmlDoc, xml);
    exsltRegisterAll();

    xsltSetGenericErrorFunc(NULL, xslt_generic_error_handler);

    ss = xsltParseStylesheetDoc(xmlCopyDoc(xml, 1)); /* 1 => recursive */

    xsltSetGenericErrorFunc(NULL, NULL);

    return Data_Wrap_Struct(klass, NULL, dealloc, ss);
}

Note that we are making a copy of the XML document here. Presumeably, this copy will be freed somewhere else.

Lets take a look at the xslt_generic_error_handler callback as well, from ext/nokogiri/xslt_stylesheet.c:

NORETURN(static void xslt_generic_error_handler(void * ctx, const char *msg, ...));
static void xslt_generic_error_handler(void * ctx, const char *msg, ...)
{
  char * message;
  VALUE exception;

  va_list args;
  va_start(args, msg);
  vasprintf(&message, msg, args);
  va_end(args);

  exception = rb_exc_new2(rb_eRuntimeError, message);
  vasprintf_free(message);
  rb_exc_raise(exception);
}

This the call to rb_exc_raise() is our first culprit, and it is similar to an issue that I have seen in Nokogiri before. When you raise an exception, Ruby performs a longjmp and the method never returns. However, C libraries often expect callbacks to return so that they can clean up after themselves. If you prevent this from happening, you can inadvertently leak memory.

Digging Deeper -- libxslt

Let us take a look at what libxslt does when it parses a document and encounters an error. After poking around in the libxslt sources a little, I found that every error increments a struct member named errors. Lets look at all of the places where libxslt has an alternate code path for this struct member:

$ git clone git://git.gnome.org/libxslt
$ cd libxslt
$ grep -r ">errors" * | grep if | grep -v ++

libxslt/xslt.c:	    if (retStyle->errors != 0) {
libxslt/xslt.c:	if (retStyle->errors != 0) {
xsltproc/xsltproc.c:		    if (cur->errors != 0) {
xsltproc/xsltproc.c:    if ((cur != NULL) && (cur->errors == 0)) {

We can ignore xsltproc.c since this appears to be a commandline interface to libxslt and is not part of the library itself. The first of the two occurrences in xslt.c is commented out with the #XSLT_REFACTORED constant, so the only place where there appears to be a different code path for the errors struct member is this piece of code:

    if (retStyle != NULL) {
	if (retStyle->errors != 0) {
	    retStyle->doc = NULL;
	    if (parentStyle == NULL)
		xsltCleanupStylesheetTree(doc,
		    xmlDocGetRootElement(doc));
	    xsltFreeStylesheet(retStyle);
	    retStyle = NULL;
	}
    }

Note that the error path sets the associated XML document for the stylesheet to NULL here. Lets dig just a little further and take a look at xsltFreeStylesheet() in xslt.c:

void
xsltFreeStylesheet(xsltStylesheetPtr style)
{
    if (style == NULL)
        return;

    xsltFreeKeys(style);

    ...

    /*
    * Better to free the main document of this stylesheet level
    * at the end - so here.
    */
    if (style->doc != NULL) {
        xmlFreeDoc(style->doc);
    }

    xmlDictFree(style->dict);

    memset(style, -1, sizeof(xsltStylesheet));
    xmlFree(style);
}

Remember that the error path above set the associated XML document to NULL. This means that xsltFreeStylesheet() will not free the document when it encountered a parsing error. The application which invokes the parser is responsible for freeing the document.

Solution

The first fix is to prevent Nokogiri from raising an exception in the error callback method. Instead we print out a warning to stdout when $VERBOSE is set:

static void xslt_generic_error_handler(void * ctx, const char *msg, ...)
{
  char * message;

  va_list args;
  va_start(args, msg);
  vasprintf(&message, msg, args);
  va_end(args);

  rb_warning("%s", message);

  vasprintf_free(message);
}

We also need to free the xml document when parsing fails. Lets also throw an exception along the way:

static VALUE parse_stylesheet_doc(VALUE klass, VALUE xmldocobj)
{
    xmlDocPtr xml, xml_cpy;
    xsltStylesheetPtr ss ;
    Data_Get_Struct(xmldocobj, xmlDoc, xml);
    exsltRegisterAll();

    xsltSetGenericErrorFunc(NULL, xslt_generic_error_handler);

    xml_cpy = xmlCopyDoc(xml, 1); /* 1 => recursive */
    ss = xsltParseStylesheetDoc(xml_cpy);

    xsltSetGenericErrorFunc(NULL, NULL);

    if (!ss) {
	xmlFreeDoc(xml_cpy);
	rb_raise(rb_eRuntimeError, "Unable to Parse the XSLT Document.");
    }

    return Data_Wrap_Struct(klass, NULL, dealloc, ss);
}

Exercise for the Reader

Fixing bugs like this is challenging, but it is a great way even for beginners to learn how to work with C code and for learning how Ruby works internally. I stopped at the first memory leak I found in Nokogiri. Perhaps there are more that can be found and fixed using a similar technique. You could also find other Ruby C extensions and make them better by finding and fixing memory leaks.

Another fun, though challenging, exercise would be to figure out how to replace the generic exception that the solution above throws, and replace it with an exception that contains all of the errors that were thrown.

Discussion

The actual bugfix and some lively discussion in Nokogiri Pull Request #498 .

Lots of opinions in this Reddit post .

A very nice mention in The Ruby Show #175. Thank you :)

Got lots of readers from Ruby Weekly Issue #52 .

RubyFlow kindly allowed me to do some self promotion .