Python: split list into list of lists, by delimiter element

Question:

Python 3.2.

There is a list lst , which has values ​​interspersed with delimiter elements. For example, ["spam", "ham", None, "eggs", None, None, "bacon"] . I want to get a list of lists by splitting lst on the delimiter sep = None , i.e., get [["spam", "ham"], ["eggs"], ["bacon"]] .

I looked through the standard library but didn't find anything similar. It’s hard to search on PyPi, a quick run didn’t give anything either. A brazen attempt to exploit str.split , of course, failed with a TypeError .

Please advise a more beautiful solution than this eye-catching monster. I don't want to feel like Frankenstein.

from functools import reduce

# Fugly.
def split_on(sep, lst):
    """
    Given an iterable `lst`, split it into iterable of lists by `sep`.

    >>> list(split_on(0, [1, 2, 3, 0, 4, 5, 0, 0, 6]))
    [[1, 2, 3], [4, 5], [6]]
    """
    s = sep if hasattr(sep, "__call__") else lambda x: x == sep
    return filter(lambda sublist: len(sublist) > 0,
                  reduce(lambda x, elem: x + [[]] if elem == sep
                                                else x[:-1] + [x[-1] + [elem]],
                         lst, [[]]))

Answer:

  • Apparently, functional programming has left a serious imprint on you 🙂

I note right away that the split semantics for the case of your example implies the return [["spam", "ham"], ["eggs"], [], ["bacon"]]. This is true because there is an empty list between None and None in terms of delimiters.

  • So, there are several possible solutions. The most explicit option implies something along the lines of:

     def split_on(what, delimiter = None): splitted = [[]] for item in what: if item == delimiter: splitted.append([]) else: splitted[-1].append(item) return splitted
  • It is clear that this solution works up to the contract of the function regarding work in the case of an empty list – [] and a list consisting only of a delimiter[None].

  • I defined this contract like this: [ ] -> [[ ]] , [None] -> [[], []] . For the first case, the contract is rather controversial. In the second case, the result is obtained, since empty sequences are essentially located to the left and right of the separator.

  • In case you want to change this behavior, then modifying the method should not be difficult.

  • Usage example:

     list1 = ["spam", "ham", None, "eggs", None, None, "bacon"] list2 = [] list3 = [None] list4 = ["eggs"] print split_on(list1) print split_on(list2) print split_on(list3) print split_on(list4) # Результат: [['spam', 'ham'], ['eggs'], [], ['bacon']] [[]] [[], []] [['eggs']]
  • Of the alternatives, you can write a generator similar to the proposed function with yield'ами , and I think that you can come up with a solution by breaking the proposed iterable sequence into groups, and then combining the results by groupby from itertools . True, it seems to me that the obviousness of these solutions in comparison with the method proposed above will be somewhat worse.

Scroll to Top