Comments on S.Lott-Software Architect: Amazing Speedup

Oh, I see. The data is the index and value from a ...

2010-12-29T18:41:11.579-05:00

Oh, I see. The data is the index and value from a column, and the code is to fill in the "missing" numbers with Nones?

A defaultdict may be perfect for this. Since source is already in the form of a series of a list of (index, value) tuples, we can just pass this straight to the constructor.

>>> source = [ (1, 'a'), (5, 'b') ]
>>> import collections
>>> data = collections.defaultdict(lambda: None, source)

(add a list comprehension to call SomeClass constructor: collections.defaultdict(lambda: None, [(k, SomeClass(v)) for k,v in source])

Then your code can just treat data as if it were a list for indexing.

>>> data[1]
'a'
>>> data[2]
>>> data[3]
>>> data[4]
>>> data[5]
'b'

If you want to make it into a real list (for slicing, etc) you can do this with a simple comprehension:

>>> datalist = [data[a] for a in xrange(max(data.keys())+1)]
>>> datalist
[None, 'a', None, None, None, 'b']

You could also replace this with a generator expression if you wanted to save memory I guess, but you may as well leave it as a defaultdict in that case.

2010-12-29T18:38:02.457-05:00

This comment has been removed by the author.

2010-12-29T18:34:08.611-05:00

This comment has been removed by the author.

2010-12-29T18:27:39.458-05:00

This comment has been removed by the author.

The index is 1-based. It's the column number ...

2010-12-29T17:29:38.903-05:00

The index is 1-based. It's the column number from reading Excel spreadsheets.

I take it index from some_source must be increasin...

2010-12-29T16:36:17.569-05:00

I take it index from some_source must be increasing and start at least with 1? (The latter because starting at zero, which seems more natural for a general index, results in an infinite loop: counter + 1 will never equal 0. Was this an error in simplifying for the blog?)

@Kurt: You can't eliminate counter as in either of those, as the number of Nones depends on the difference between successive indexes, not on index alone. Your code gives a different result.

And realizing the importance is the difference between successive indexes leads me to write (how to format code for blogspot?):

def something_iter():
..cur_index = 1 # instead of 0 for reason above
..for next_index, value in some_source:
....for _ in xrange(cur_index, next_index):
......yield None
....cur_index = next_index
....yield SomeClass(value)

I don't consider this any significant improvement over the while loop version, but I think it would help prevent misunderstandings similar to Kurt's.

Interesting. You may consider just using a list c...

2010-12-29T13:10:57.074-05:00

Interesting. You may consider just using a list comprehension or generator expression as well for that second piece:

[ [None]*index + [SomeClass(value)] for index,value in source ]

itertools.chain.from_iterable( ( itertools.chain( itertools.repeat(None, index), [SomeClass(value)] ) for index, value in source ) )

This arguably simplifies the code by removing the explicit "counter" variable, and the nested loop.