S.Lott-Software Architect: Amazing Speedup

Tuesday, December 28, 2010

Amazing Speedup

A library had unit tests that ran for almost 600 seconds. Two small changes dropped the run time to 26 seconds.

I was amazed.

Step 1. I turned on the cProfile. I added two methods to the slowest unit test module.

def profile():
    import cProfile
    cProfile.run( 'main()', 'the_slow_module.prof' )
    report()

def report():
    import pstats
    p = pstats.Stats( 'the_slow_module.prof' )
    p.sort_stats('time').print_callees(24)

Now I can add profiling or simply review the report. Looking at the "callees" provided some hints as to why a particular method was so slow.

Step 2. I replaced ElementTree with cElementTree (duh.) Everyone should know this. I didn't realize how much this mattered. The trick is to note how much time was spent doing XML parsing. In the case of this unit test suite, it was a LOT of time. In the case of the overall application that uses this library, that won't be true.

Step 3. The slowest method was assembling a list. It did a lot of list.append(), and list.__len__(). It looked approximately like the following.

def something( self ):
result= []
for index, value in some_source:
    while len(result)+1 != index:
        result.append( None )
    result.append( SomeClass( value ) )
return result

This is easily replaced by a generator. The API changes, so every use of this method function may need to be modified to use the generator instead of the list object.

def something_iter( self ):
 counter= 0
 for index, value in some_source:
     while counter+1 != index:
         yield None
         counter += 1
     yield SomeClass( value )
     counter += 1

The generator was significantly faster than list assembly.

Two minor code changes and a significant speed-up.

7 comments:

Kurt RoseDecember 29, 2010 at 1:10 PM
Interesting. You may consider just using a list comprehension or generator expression as well for that second piece:

[ [None]*index + [SomeClass(value)] for index,value in source ]

itertools.chain.from_iterable( ( itertools.chain( itertools.repeat(None, index), [SomeClass(value)] ) for index, value in source ) )

This arguably simplifies the code by removing the explicit "counter" variable, and the nested loop.
ReplyDelete
Replies
FredDecember 29, 2010 at 4:36 PM
I take it index from some_source must be increasing and start at least with 1? (The latter because starting at zero, which seems more natural for a general index, results in an infinite loop: counter + 1 will never equal 0. Was this an error in simplifying for the blog?)

@Kurt: You can't eliminate counter as in either of those, as the number of Nones depends on the difference between successive indexes, not on index alone. Your code gives a different result.

And realizing the importance is the difference between successive indexes leads me to write (how to format code for blogspot?):

def something_iter():
..cur_index = 1 # instead of 0 for reason above
..for next_index, value in some_source:
....for _ in xrange(cur_index, next_index):
......yield None
....cur_index = next_index
....yield SomeClass(value)

I don't consider this any significant improvement over the while loop version, but I think it would help prevent misunderstandings similar to Kurt's.
ReplyDelete
Replies
S.LottDecember 29, 2010 at 5:29 PM
The index is 1-based. It's the column number from reading Excel spreadsheets.
ReplyDelete
Replies
Kurt RoseDecember 29, 2010 at 6:27 PM
This comment has been removed by the author.
ReplyDelete
Replies
Kurt RoseDecember 29, 2010 at 6:34 PM
This comment has been removed by the author.
ReplyDelete
Replies
Kurt RoseDecember 29, 2010 at 6:38 PM
This comment has been removed by the author.
ReplyDelete
Replies
Kurt RoseDecember 29, 2010 at 6:41 PM
Oh, I see. The data is the index and value from a column, and the code is to fill in the "missing" numbers with Nones?

A defaultdict may be perfect for this. Since source is already in the form of a series of a list of (index, value) tuples, we can just pass this straight to the constructor.

>>> source = [ (1, 'a'), (5, 'b') ]
>>> import collections
>>> data = collections.defaultdict(lambda: None, source)

(add a list comprehension to call SomeClass constructor: collections.defaultdict(lambda: None, [(k, SomeClass(v)) for k,v in source])

Then your code can just treat data as if it were a list for indexing.

>>> data[1]
'a'
>>> data[2]
>>> data[3]
>>> data[4]
>>> data[5]
'b'

If you want to make it into a real list (for slicing, etc) you can do this with a simple comprehension:

>>> datalist = [data[a] for a in xrange(max(data.keys())+1)]
>>> datalist
[None, 'a', None, None, None, 'b']

You could also replace this with a generator expression if you wanted to save memory I guess, but you may as well leave it as a defaultdict in that case.
ReplyDelete
Replies