Generate specific combinations in Python

Question:

I have a list [1,2,3,4,5,6,7,8,9] and I would like to get all the lists that arise as a result of grouping its elements in pairs and one alone, regardless of the order of the pairs. That is, I expect something like:

[[1,2],[3,4],[5,6],[7,8],[9]],
[[1,2],[3,4],[5,6],[7,9],[8]],
[[1,2],[3,4],[5,6],[8,9],[7]],
[[1,2],[3,4],[5,7],[6,8],[9]],
[[1,2],[3,4],[5,7],[6,9],[8]],
...

As @abulafia indicates, the total number of lists in the case of 9 elements would be:

C(9,2) * C(7,2) * C(5,2) * C(3,2)

where C (n, m) are the combinations of n elements taken from m to m. With 22680 possibilities as a result, quite far from the 36 that it indicated.

A greeting and thanks in advance

Answer:

There are many more combinations of 36. In fact I get 22680. ( Update All this is incorrect. The final answer is under "Update 2". However, I leave the original answer so that you can see the process of how the good answer)

It has been difficult for me to come up with an algorithm that generates them all, and I have had to resort to recursion. Still the algorithm is quite difficult to understand. I present you for the moment the code without explanations and an example of the result of its execution for a list of only 5 numbers, which already produces 30 cases. The explanations later.

Code

from itertools import combinations

def todas_las_parejas(elementos):
  if len(elementos) <=2:
    return [[list(elementos)]]
  else:
    result = []
    for pareja in combinations(elementos, 2):
      for resto in todas_las_parejas(set(elementos)-set(pareja)):
        caso = [list(pareja)]
        caso.extend(resto)
        result.append(caso)
    return result

Execution example:

resultado = todas_las_parejas([1,2,3,4,5])
print(len(resultado))
for caso in resultado:
  print(caso)

30
[[1, 2], [3, 4], [5]]
[[1, 2], [3, 5], [4]]
[[1, 2], [4, 5], [3]]
[[1, 3], [2, 4], [5]]
[[1, 3], [2, 5], [4]]
[[1, 3], [4, 5], [2]]
[[1, 4], [2, 3], [5]]
[[1, 4], [2, 5], [3]]
[[1, 4], [3, 5], [2]]
[[1, 5], [2, 3], [4]]
[[1, 5], [2, 4], [3]]
[[1, 5], [3, 4], [2]]
[[2, 3], [1, 4], [5]]
[[2, 3], [1, 5], [4]]
[[2, 3], [4, 5], [1]]
[[2, 4], [1, 3], [5]]
[[2, 4], [1, 5], [3]]
[[2, 4], [3, 5], [1]]
[[2, 5], [1, 3], [4]]
[[2, 5], [1, 4], [3]]
[[2, 5], [3, 4], [1]]
[[3, 4], [1, 2], [5]]
[[3, 4], [1, 5], [2]]
[[3, 4], [2, 5], [1]]
[[3, 5], [1, 2], [4]]
[[3, 5], [1, 4], [2]]
[[3, 5], [2, 4], [1]]
[[4, 5], [1, 2], [3]]
[[4, 5], [1, 3], [2]]
[[4, 5], [2, 3], [1]]

Discussion of the result

As you can see, the 30 cases that appear are not the 10 you would expect if you applied the formula of "combinations of 5 elements taken 2 by 2". And it is that the combinations of 5 elements of 2 in 2 yes that are indeed 10, these 10:

(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3 , 5), (4, 5)

But those 10 are not the whole story. These are only the 10 possible ways to start each of the (30) cases of the result. Once we have chosen one of these combinations, for example (1,2), there is still the rest of the list (3,4,5) from which we can again take combinations of 2 by 2 (other 3 cases: (3 , 4), (3,5) and (4,5)). In this example, in which the list has 5 elements, the thing ends there, since once one of those second cases has been chosen, there is only one possible element to choose from.

Hence, the 30 cases come from there. We have 10 possible ways to start and for each of them another 3 possible ways to continue.

In the case of having 9 numbers, the formula basically becomes:

C (9.2) * C (7.2) * C (5.2) * C (3.2)

Being C (n, m) the number of combinations of n elements taken from m in m. If you do the numbers it comes out:

36 * 21 * 10 * 3 = 22680

which are what the code that I put above actually produces.

Explanation of the code

The code uses recursion, which basically translates to:

  1. Solve the problem for a trivial case: when the list has only two numbers or less.
  2. Solve the problem for any other list, assuming we already have a function that does it for us for a smaller list.

The code does the following:

  1. In the case that the received list has two elements or less, it simply returns a list of lists that has the pair of elements inside (or a single element). For example, if you receive as a parameter a list with [2,3], it would return [[[2,3]]] . If it receives a list with 9 as a parameter, it would return [[[9]]] .

    If you notice, that answer is correct. It contains all the ways to pair the list that you have received as a parameter in pairs.

  2. In another case we must build a list with all possible cases. For it:

    • We list (thanks to itertools.combinations() ) all the possible ways to start our case (which are all the combinations from the given list, taking their elements two by two)
    • For each of those pairs:
    • We remove that pair from the original list (using set arithmetic to make it simpler)
    • We call the function that we assume already knows how to solve the problem for a smaller list.
    • The result of calling this function will be a list with all the ways to group the elements of the sublist in pairs. So we create a new case for each of the returned cases. Each new case begins with the couple in question and continues with each of the cases that the "magic" function has returned to us.
    • We are adding to a list each case generated
    • Finally we return that list

It only remains to write the "magic" function that knows how to solve the problem for a sublist. But, thanks to the wonders of recursion, it turns out that we already have that function, it's the one we just wrote!

Although it may seem incredible (recursion always seems to me), it works. It is enough that the function calls itself.

Upgrade

The above code generates cases that differ only in the order of the joins. For example, the above outputs include:

[[1, 4], [3, 5], [2]]
[[3, 5], [1, 4], [2]]

Update 2 The following is wrong. Contains a subtle bug that I explain and fix later, in Update 2

To avoid this, you have to modify the code slightly. Instead of making each pair a list, I'll make it a tuple. The difference in our case is that tuples can be put into sets, and lists cannot. Thus each "case" can be reduced to a set and in them the order does not matter, so two cases that only differ in order will give rise to the same set. I'm adding those sets to the result if they weren't already there before.

A problem with sets is that they do not have internal order either when displaying them, so the final results could show cases like [(1,), (3,4), (2,5)] , that is, the "lonely" element would no longer necessarily appear at the end.

To avoid that "aesthetic" detail, I keep two lists. One of normal results (the same as that of the initial solution) and another of sets. I use this second one only to avoid putting repeated cases in the first one.

This is the new code:

from itertools import combinations

def todas_las_parejas(elementos):
  if len(elementos) <=2:
    return [[tuple(elementos)]]
  else:
    result = []
    diferentes = []
    for pareja in combinations(elementos, 2):
      for resto in todas_las_parejas(set(elementos)-set(pareja)):
        caso = [tuple(pareja)]
        caso.extend(resto)
        if set(caso) not in diferentes:
          result.append(caso)
          diferentes.append(set(caso))
    return result

And now when executing for [1,2,3,4,5] only 15 cases appear:

[(1, 2), (3, 4), (5,)]
[(1, 2), (3, 5), (4,)]
[(1, 2), (4, 5), (3,)]
[(1, 3), (2, 4), (5,)]
[(1, 3), (2, 5), (4,)]
[(1, 3), (4, 5), (2,)]
[(1, 4), (2, 3), (5,)]
[(1, 4), (2, 5), (3,)]
[(1, 4), (3, 5), (2,)]
[(1, 5), (2, 3), (4,)]
[(1, 5), (2, 4), (3,)]
[(1, 5), (3, 4), (2,)]
[(2, 3), (4, 5), (1,)]
[(2, 4), (3, 5), (1,)]
[(2, 5), (3, 4), (1,)]

For the 9 digits the thing has been reduced to 2235 cases. But I couldn't tell you what the general formula is where this number comes from (another challenge! Damn it!)

Update 2

The previous code has a bug (I was already surprised that the final number of combinations for the 9 digits was 2235, which does not factor well since it is 3x5x149, a strange trio of cousins ​​that did not look good at all).

The bug becomes patent if we generate the list of combinations for the case [4,5,6,7,8] , instead of [1,2,3,4,5] . Obviously the same number of combinations should come out (15), but instead 27 come out. Examining the results I find that there appear "duplicate" cases that shouldn't be:

[(4, 5), (8, 6), (7,)]
[(6, 8), (4, 5), (7,)]

The problem here is that the pair (8,6) is considered different from the pair (6,8) , so that the set of different combinations considers them in fact two different valid combinations.

The case did not appear when we used [1,2,3,4,5] as input because coincidentally for that case all the tuples generated were in increasing order. That is, in each generated tuple (x,y) was true that x<y . Therefore, the version (y,x) of that same tuple never appeared.

This behavior can be considered an accident. Actually we have no guarantees of the order in which the tuples will come out, because when we recursively call the function we are no longer passing a list, but a set ( set(elementos)-set(pareja) ). itertools.combinations() iterate over the elements of that set to generate pairs, but a set does not give guarantees about the order in which it will return its elements, so it could have returned as the first pair both (3,4) and (4,3) .

This accidental behavior of the list [1,2,3,4,5] by which the tuples were generated casually ordered, also appears for other lists. But instead it disappears in the list [4,5,6,7,8] where unordered tuples are beginning to be seen and therefore are not recognized as repetitions.

Bug fix

For the tuple (x,y) be considered the same as the tuple (y,x) it is best to stop using tuples to represent combinations, and also use sets.

Apparently therefore it would be enough to change in the code before seen all the occurrences of the word tuple() by set() . However, it is not that easy.

Problems:

  • A tuple can be a member of another set, but a set() not. Not all python data types can be members of a set. Only the ones that are hasable (and in particular immutable). This was the reason why I had swapped lists for tuples in the first place, in order to fit them into sets. If I change it back to set() I can no longer put it in a set, which I need to recognize repeated cases.
  • The problem is fixed if instead of set() use frozenset() to represent each pair. This is a special type of assembly to which you cannot add or remove items. That is, an immutable set, which is therefore hashable and can be part of other sets.
  • The only problem with frozenset() is that it makes the program output very dirty, because now then a possible combination would be displayed like this when printed:

     [frozenset({1, 2}), frozenset({3, 4}), frozenset({5})]

    instead of like this:

     [{1, 2}, {3, 4}, {5}]

Luckily it is easy to define how you want your own class to be printed. In the following code I define my MySet class, which inherits from frozenset() , but redefines the __repr__() method to make the screen output more compact and readable. I use this class where before I used tuple() .

from itertools import combinations

class MySet(frozenset):
  def __repr__(self):
    return "{%s}" % (", ".join(str(e) for e in self))


def todas_las_parejas_sin(elementos):
  if len(elementos) <=2:
    return [[MySet(elementos)]]
  else:
    result = []
    diferentes = set()
    for pareja in combinations(elementos, 2):
      for resto in todas_las_parejas_sin(set(elementos)-set(pareja)):
        caso = [MySet(pareja)]
        caso.extend(resto)
        if MySet(caso) not in diferentes:
          result.append(caso)
          diferentes.add(MySet(caso))
    return result

Execution example:

for caso in todas_las_parejas_sin([1,2,3,4]):
  print(caso)

[{1, 2}, {3, 4}]
[{1, 3}, {2, 4}]
[{1, 4}, {2, 3}]

Now it works correctly for the case where the input is [4,5,6,7,8] , yielding 15 combinations instead of 27.

And for the case where the input is [1,2,3,4,5,6,7,8,9] the number of generated combinations is only 945 (instead of 2235).

Bonus

I have found the formula for the total number of combinations to come out. It is enough to divide the originally calculated number (22680) by a factorial of 4 (24) and it comes out 945. This is because in each case there are 4 pairs, whose order does not matter and therefore we divide by the number of permutations of those 4 elements.

The general formula would be to divide by the factorial of the number of pairs that are formed for each case. That is to say:

def num_casos(n):
  resultado = 1
  n_parejas = 0
  while n>1:
    resultado *= C(n, 2) # Combinaciones de n elementos tomados de 2 en 2
    n -= 2
    n_parejas += 1
  return resultado/factorial(n_parejas)

(It would be necessary to implement the function C(n,m) and factorial(n) )

Scroll to Top