Bio and Publications

Tuesday, October 6, 2009

Flattening Nested Lists -- You're Doing It Wrong

On StackOverflow you can read numerous questions on "flattening" nested lists in Python.

They all have a similar form.

"How do I flatten this list [ [ 1, 2, 3 ], [ 4, 5, 6 ], ... , [ 98, 99, 100 ] ]?"

The answers include list comprehensions, itertools, and other clever variants.

All Much of which is simply wrong inappropriate.

You're Doing it Wrong

The only way to create a nested list is to append a list to a list.

theList.append( aSubList )

You can trivially replace this with the following

theList.extend( aSubList )

Now, your list is created flat. If it's created flat, you never need to flatten it.

Obscure Edge Cases

Sometimes it may be necessary to have both a flattened and an unflattened list. I'm unclear on when or how this situation arises, but this may be edge case that makes some of itertools handy.

For the past 3 decades, I've never seen the "both nested and not nested" use case, so I can't fathom why or how this would arise.

Visiting a Tree

Interestingly, a tree visitor has a net effect somewhat like "flattening". However, it does not actually create an intermediate flat structure. It simply walks the structure as it exists. This isn't a proper use case for transforming a nested list structure to a flat structure. Indeed, this is a great example of why nested structures and flat structures are quite separate use cases.

13 comments:

  1. This is pretty naive. The times I've needed to flatten a list were because I needed the non-flattened form for one thing, and the flattened form for another thing. It's not like we just don't know about extend versus append.

    ReplyDelete
  2. so, remove itertools.chain() from the standard lib ?

    ReplyDelete
  3. I think we can safely remove itertools altogether after this.

    ReplyDelete
  4. Naive indeed. Besides the nested version being useful as well, what about working with data structures that one didn't create. :/ Uninspired.

    ReplyDelete
  5. I often have lists of objects each of which has an attribute which holds a list of different objects. This is nested list like situation which I can flatten using minor alterations to the solutions provided on Stack Overflow. I don't think your suggestion would apply here.

    OTOH, I've generally noticed that extend is used a lot lesser than append.

    ReplyDelete
  6. List.append is not even close to the only way to create nested lists. The first counterexample that springs to mind is when you map a function over a list where the result of the function is a list.

    ReplyDelete
  7. Putting nesting aside, I find myself using the following rather than [].extend([]):

    >>> L = [1,2,3]
    >>> L += [4,5,6]
    >>> L
    [1, 2, 3, 4, 5, 6]

    ReplyDelete
  8. @Michael Watkins: I would be careful about getting into that habit. If you do that in a loop, you create a bunch of intermediate lists and it'll get slow really fast.

    ReplyDelete
  9. @Michael: Are you sure it creates intermediate lists? After testing in Python 2.6.2, the += operator extends the existing list, and does not create a new one. The id() of the list does not change at any point in the loop. However, if you write 'L = L + newitems' you do indeed get a bunch of new objects.

    ReplyDelete
  10. > The only way to create a nested list is to append a list to a list.

    Note quite true. One case I had the other day was like:

    # got input data like this:
    >>> row = ["Apples", "100", "40 +- 10", "50 +- 4"]
    >>> row = [c.Split("+-") for c in row]
    >>> row
    [['Apples'], ['100'], ['40 ', ' 10'], ['50 ', ' 4']]

    Where I wanted to flatten "row" to:
    ['Apples', '100', '40 ', ' 10', '50 ', ' 4']

    (Later stripping whitespace and converting everything where applicable to float.)

    I ended up naming each column explicitly, something like this:
    >>> a,b,c,d = row
    >>> c1, c2 = c.Split("+-")
    ...

    ... which is in the real code pretty verbose and ugly. Wish Python had a list.flatten(optional_depth) method...

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. (Reposted with slightly better formatting. I've made a mental note that typing
    reStructuredText into arbitrary text boxes doesn't usually end well.)

    People were extremely defensive about this post for some reason. I think it's an excellent point -- if you don't need the nested structure for something else, you should be creating a flat sequence to begin with.

    As a corollary, you may prefer a for-loop to a list comprehension if you use a transform function that creates a list, but want a flattened result at the end. For example,

    flat = []
    for item in seq:
    flat += transform(item)

    As preferable to:

    nested = (transform(item) for item in seq)
    flat = itertools.chain.from_iterable(nested)

    The former certainly gets the point across with less jargon, if ``nested`` is created by you (as opposed to being passed in from elsewhere).

    ReplyDelete
  13. @Nacho

    Your list comprehension is, in effect, doing an append. You should just use a normal loop:

    row = [...]
    stuff = []
    for item in row:
    ....stuff += item.split('+-')

    Reusing the name row isn't saving you anything, and this version is clearer, *and* it gets you what you want.

    ReplyDelete

Note: Only a member of this blog may post a comment.