Tuesday, December 28, 2010

Amazing Speedup

A library had unit tests that ran for almost 600 seconds. Two small changes dropped the run time to 26 seconds.

I was amazed.

Step 1. I turned on the cProfile. I added two methods to the slowest unit test module.

def profile():
    import cProfile
    cProfile.run( 'main()', 'the_slow_module.prof' )

def report():
    import pstats
    p = pstats.Stats( 'the_slow_module.prof' )
Now I can add profiling or simply review the report. Looking at the "callees" provided some hints as to why a particular method was so slow.

Step 2. I replaced ElementTree with cElementTree (duh.) Everyone should know this. I didn't realize how much this mattered. The trick is to note how much time was spent doing XML parsing. In the case of this unit test suite, it was a LOT of time. In the case of the overall application that uses this library, that won't be true.

Step 3. The slowest method was assembling a list. It did a lot of list.append(), and list.__len__(). It looked approximately like the following.

def something( self ):
result= []
for index, value in some_source:
    while len(result)+1 != index:
        result.append( None )
    result.append( SomeClass( value ) )
return result

This is easily replaced by a generator. The API changes, so every use of this method function may need to be modified to use the generator instead of the list object.

def something_iter( self ):
 counter= 0
 for index, value in some_source:
     while counter+1 != index:
         yield None
         counter += 1
     yield SomeClass( value )
     counter += 1

The generator was significantly faster than list assembly.

Two minor code changes and a significant speed-up.


  1. Interesting. You may consider just using a list comprehension or generator expression as well for that second piece:

    [ [None]*index + [SomeClass(value)] for index,value in source ]

    itertools.chain.from_iterable( ( itertools.chain( itertools.repeat(None, index), [SomeClass(value)] ) for index, value in source ) )

    This arguably simplifies the code by removing the explicit "counter" variable, and the nested loop.

  2. I take it index from some_source must be increasing and start at least with 1? (The latter because starting at zero, which seems more natural for a general index, results in an infinite loop: counter + 1 will never equal 0. Was this an error in simplifying for the blog?)

    @Kurt: You can't eliminate counter as in either of those, as the number of Nones depends on the difference between successive indexes, not on index alone. Your code gives a different result.

    And realizing the importance is the difference between successive indexes leads me to write (how to format code for blogspot?):

    def something_iter():
    ..cur_index = 1 # instead of 0 for reason above
    ..for next_index, value in some_source:
    ....for _ in xrange(cur_index, next_index):
    ......yield None
    ....cur_index = next_index
    ....yield SomeClass(value)

    I don't consider this any significant improvement over the while loop version, but I think it would help prevent misunderstandings similar to Kurt's.

  3. The index is 1-based. It's the column number from reading Excel spreadsheets.

  4. This comment has been removed by the author.

  5. This comment has been removed by the author.

  6. This comment has been removed by the author.

  7. Oh, I see. The data is the index and value from a column, and the code is to fill in the "missing" numbers with Nones?

    A defaultdict may be perfect for this. Since source is already in the form of a series of a list of (index, value) tuples, we can just pass this straight to the constructor.

    >>> source = [ (1, 'a'), (5, 'b') ]
    >>> import collections
    >>> data = collections.defaultdict(lambda: None, source)

    (add a list comprehension to call SomeClass constructor: collections.defaultdict(lambda: None, [(k, SomeClass(v)) for k,v in source])

    Then your code can just treat data as if it were a list for indexing.

    >>> data[1]
    >>> data[2]
    >>> data[3]
    >>> data[4]
    >>> data[5]

    If you want to make it into a real list (for slicing, etc) you can do this with a simple comprehension:

    >>> datalist = [data[a] for a in xrange(max(data.keys())+1)]
    >>> datalist
    [None, 'a', None, None, None, 'b']

    You could also replace this with a generator expression if you wanted to save memory I guess, but you may as well leave it as a defaultdict in that case.