Master Kruskal’s Algorithm in 5 Steps with Python

What will we cover in this tutorial?

In this tutorial we will show how the Kruskal’s algorithm works. First, we will just give an example of what it does.

Consider the following a undirected edge-weighted graph. The vertices (or nodes) are numbered from 0 to 7 to make it simpler later. It contains edges, which are weighted. That means that the weight between vertex (or node) 0 and 1 is 6. Similar, the weight between vertex (or node) 0 and 2 is 3.

An undirected edge-weighted graph.

Our goal is to find a minimum spanning tree or forest if graph is not connected. That means we need to find a subset of edges, which connects all vertices (or nodes) with the minimum sum of weight.

Minimum spanning tree of the above graph.

First wenNotice, that all the vertices (nodes) are connected and we have chosen the edges the the lowest weight between them.

That is the challenge we want to solve. Given any undirected edge-weighted graph, find the minimum spanning tree of that graph.

In this tutorial we will understand and find a solution using Kruskal’s algorithm, which solves the above problem for an undirected edge-weighted graph.

Step 1: High level understanding of Kruskal’s algorithm

We can see on wikipedia the following pseudo code is given.

algorithm Kruskal(G) is
    F:= ∅
    for each v ∈ G.V do
        MAKE-SET(v)
    for each (u, v) in G.E ordered by weight(u, v), increasing do
        if FIND-SET(u) ≠ FIND-SET(v) then
            F:= F ∪ {(u, v)}
            UNION(FIND-SET(u), FIND-SET(v))
    return F

Which is actually quite nice. So let us first try to understand what happens in the code. Below we describe the pseudo code in bullets from 1 to 5.

  1. The algorithm Kruskal takes an argument G, which represents the graph.
  2. It creates a set F to be the empty set. The set F represents the forest, and we will add edges to the forest, F, as the algorithm iterates over the edges later.
  3. Then it creates a set for each vertex (or node) in the graph. These sets needs to be represented somehow, which is not given from the algorithm. Also, each set only contains one vertex. As the algorithm iterates along, it will join (or make unions) the sets, as the vertices are joined.
  4. Then we iterate over the edges, which are ordered in increasing order of the weight. That is, we take the edge with the lowest weight first (if more have the same weight, we just take any of them).
    • It will check if the vertices (u and v), which the edge connects, are in the same set.
    • If they are not in the same set, then it will add the edge to the forest, F, and join the sets to one.
  5. After iterating over all edges, we return the forest F.

Step 2: A visual walk-through of Kruskal’s algorithm.

Still a lot to figure out. Let’s try with an example. Below illustration depicts the graph, G, given as argument to Kruskal’s algorithm in bullet 1 (in previous Step of the tutorial).

The graph we consider.

At bullet 2, we create an empty set, F, to represent the solution.

Then at bullet 3 we make empty sets of all vertices. We can represent that with a graph with no edges.

Set of vertices

At bullet 4 we start iterating over the edges. There we will take the edge with the lowest weight from G. As there are two with weight 1, we can take any. Below we represent the sets as each group of vertices which are connected (only vertex 5 and 7 are connected), the rest are sets of a single vertex. The solution, F, is represented by the edges in the diagram.

F = [(5-7)], sets=[(0), (1), (2), (3), (4), (5, 7), (6)]

Then we take the next edge, again it is an edge with weight 1. We check if they are in the same set. Otherwise we do the same. As they are not, we do the same.

F = [(5-7), (2-4)], sets=[(0), (1), (2, 4), (3), (5, 7), (6)]

And we continue like that.

F = [(5-7), (2-4), (3-7)], sets=[(0), (1), (2, 4), (3, 5, 7), (6)]

Again.

F = [(5-7), (2-4), (3-7), (1-2)], sets=[(0), (1, 2, 4), (3, 5, 7), (6)]

Again.

F = [(5-7), (2-4), (3-7), (1-2), (5-6)], sets=[(0), (1, 2, 4), (3, 5, 7, 6)]

Again.

F = [(5-7), (2-4), (3-7), (1-2), (5-6), (2-4)], sets=[(0), (1, 2, 4, 3, 5, 7, 6)]

Again.

F = [(5-7), (2-4), (3-7), (1-2), (5-6), (2-4), (0-2)], sets=[(0, 1, 2, 4, 3, 5, 7, 6)]

Now it will continue to iterate over the edges, but none of them will be added, as we only have one set left, there is no more to be added.

Hence, it will return F = [(5-7), (2-4), (3-7), (1-2), (5-6), (2-4), (0-2)] in bullet 5 after the last iteration in the loop.

Step 3: Representing the graph

We need two classes to implement Kruskal’s algorithm. One for representing the graph and one for representing the sets.

Let’s start with the graph. What is relevant in a graph. Let’s take a look at it again.

Here are the features of the class.

  1. It contains numbered vertices from 0 to 7. That is we need to know how many vertices there are.
  2. It contains edges between the vertices. They can be represented by two vertex indices. Also, it should have a weight.
  3. In the pseudo code, we also see that we need to access how many vertices there are. Get the edges and have the sorted by the weight in increasing order.

Now how do you do that?

class Graph:
    def __init__(self, vertices):
        self.vertices = vertices
        self.edges = []
    def add_edge(self, u, v, weight):
        self.edges.append((u, v, weight))
    def sort_edges(self):
        self.edges = sorted(self.edges, key=lambda item: item[2])
    def get_edges(self):
        return self.edges
    def number_of_vertices(self):
        return self.vertices

In this way, we can create a graph with a number of vertices and add edges. When we have done that, we can sort them and get the. Also, we can get the number of vertices.

W can represent the above graph by the following code.

g = Graph(8)
g.add_edge(0, 1, 6)
g.add_edge(0, 2, 3)
g.add_edge(1, 2, 2)
g.add_edge(1, 3, 4)
g.add_edge(2, 3, 8)
g.add_edge(2, 4, 1)
g.add_edge(3, 4, 4)
g.add_edge(3, 6, 9)
g.add_edge(3, 7, 2)
g.add_edge(4, 5, 5)
g.add_edge(4, 7, 3)
g.add_edge(5, 6, 3)
g.add_edge(5, 7, 1)
g.add_edge(6, 7, 4)

That should represent the graph we have used in this tutorial.

Step 4: Representing the sets in the algorithm

A simple temptation would be to represent the sets in the algorithm as a Python list of sets. But that would not give the time complexity of the algorithm.

Hence, we need to read what is needed. On wikipedia it says, the code is implemented with a disjoint-set data structure. The disjoint-set data structure has very efficient lookups of which set a vertex belongs to, as well as fast unions. That is needed to keep the complexity of Kruskal’s algorithm to be O(E log(V)), where E is the number of edges and V the number of vertices.

The disjoint-data set structure has faster than O(log(V)) lookup and union operations. Hence, it will keep the time complexity to O(E log(V)). More about that later.

First for a implementation of the disjoint-data set structure in Python.

class DisjointSet:
    def __init__(self, size):
        self.parent = []
        self.rank = []
        for node in range(size):
            self.parent.append(node)
            self.rank.append(0)
    def find(self, v):
        if self.parent[v] == v:
            return v
        else:
            return self.find(self.parent[v])
    def union(self, u, v):
        u_root = self.find(u)
        v_root = self.find(v)
        if self.rank[u_root] < self.rank[v_root]:
            self.parent[u_root] = v_root
        elif self.rank[u_root] > self.rank[v_root]:
            self.parent[v_root] = u_root
        else:
            self.parent[v_root] = u_root
            self.rank[u_root] += 1

We will not go into details of the time complexities of this data structure.

Notice the following.

  1. The constructor creates the number of disjoint sets, as given as argument in the constructor.
  2. Find returns the root in the set. Hence, there is one uniquely defined vertex in the set that is the root. Hence, for any two vertices in the same set, the find function will always return the same root for each of the two vertices.
  3. The union function joins two sets based on a rank, in order to keep the time complexities.

Step 5: Kruskal’s algorithm

Now we have the understanding and the two helper classes for the Kruskal’s algorithm. Then the implementation is almost identical to the pseudo code from wikipedia. The pseudo code is given here for comparison.

algorithm Kruskal(G) is
    F:= ∅
    for each v ∈ G.V do
        MAKE-SET(v)
    for each (u, v) in G.E ordered by weight(u, v), increasing do
        if FIND-SET(u) ≠ FIND-SET(v) then
            F:= F ∪ {(u, v)}
            UNION(FIND-SET(u), FIND-SET(v))
    return F

And the actual Python code is given here.

def kruskal(graph):
    forest = []
    graph.sort_edges()
    disjoint_set = DisjointSet(graph.number_of_vertices())
    for (u, v, weight) in graph.get_edges():
        if disjoint_set.find(u) != disjoint_set.find(v):
            forest.append((u, v, weight))
            disjoint_set.union(u, v)
    return forest

Now let’s try our example given in this tutorial.

Below you have the full code.

class Graph:
    def __init__(self, vertices):
        self.vertices = vertices
        self.edges = []
    def add_edge(self, u, v, weight):
        self.edges.append((u, v, weight))
    def sort_edges(self):
        self.edges = sorted(self.edges, key=lambda item: item[2])
    def get_edges(self):
        return self.edges
    def number_of_vertices(self):
        return self.vertices

class DisjointSet:
    def __init__(self, size):
        self.parent = []
        self.rank = []
        for node in range(size):
            self.parent.append(node)
            self.rank.append(0)
    def find(self, v):
        if self.parent[v] == v:
            return v
        else:
            return self.find(self.parent[v])
    def union(self, u, v):
        u_root = self.find(u)
        v_root = self.find(v)
        if self.rank[u_root] < self.rank[v_root]:
            self.parent[u_root] = v_root
        elif self.rank[u_root] > self.rank[v_root]:
            self.parent[v_root] = u_root
        else:
            self.parent[v_root] = u_root
            self.rank[u_root] += 1

def kruskal(graph):
    forest = []
    graph.sort_edges()
    disjoint_set = DisjointSet(graph.number_of_vertices())
    for (u, v, weight) in graph.get_edges():
        if disjoint_set.find(u) != disjoint_set.find(v):
            forest.append((u, v, weight))
            disjoint_set.union(u, v)
    return forest

g = Graph(8)
g.add_edge(0, 1, 6)
g.add_edge(0, 2, 3)
g.add_edge(1, 2, 2)
g.add_edge(1, 3, 4)
g.add_edge(2, 3, 8)
g.add_edge(2, 4, 1)
g.add_edge(3, 4, 4)
g.add_edge(3, 6, 9)
g.add_edge(3, 7, 2)
g.add_edge(4, 5, 5)
g.add_edge(4, 7, 3)
g.add_edge(5, 6, 3)
g.add_edge(5, 7, 1)
g.add_edge(6, 7, 4)
result = kruskal(g)
for r in result:
    print(r)

This code will output.

(2, 4, 1)
(5, 7, 1)
(1, 2, 2)
(3, 7, 2)
(0, 2, 3)
(4, 7, 3)
(5, 6, 3)

Which is the same edges as we identified manually in Step 2 of this tutorial.

Bonus: Complexity of the algorithm

Let’s have the algorithm here again.

def kruskal(graph):
    forest = []
    graph.sort_edges()
    disjoint_set = DisjointSet(graph.number_of_vertices())
    for (u, v, weight) in graph.get_edges():
        if disjoint_set.find(u) != disjoint_set.find(v):
            forest.append((u, v, weight))
            disjoint_set.union(u, v)
    return forest

We have that E is the number of edges and V is the number for vertices in the graph.

Then the sort on the graph uses O(E log(E)), but as we have that E is upper bonded by V^2, it follows that O(log(E)) = O(log(V^2)) = O(2 log(V)) = O(log(V)).

The for-loop over the edges runs E times. In each loop we call find and possible union on the disjoint-set data structure. These calls where upper bounded by O(log(V)). Hence, the main loop runs in O(E log(V)).

All in all O(E log(V) + E log(V)) = O(E log(V)).

How-To Solve the Edit Distance Problem with Dynamic Programming

What will we cover in this tutorial?

In this tutorial we will make a dynamic programming solution to the Edit Distance Problem.

Consider the following problem. Given a string u = “abbabc” and v = “ababac”, and the operations to insert, remove or change a character in string u. What is the minimum amount of operations you need to transform u into v.

  1. Remove b: u = “abbabc”remove b -> u’ = “ababc”.
  2. Insert a: u’ = “ababc” – insert a -> u” = “ababac”.

Hence, with two operations we transform u into v.

How can you make a program that solves the general problem for two arbitrary strings, u and v, with minimal operations? And even better, how can we solve it with dynamic programming and how does that help us?

Step 1: Solve the Edit Distance Problem without Dynamic Programming

It is a good idea to solve a problem with a normal approach first. This makes you understand the problem at heart.

To get started with solving a problem like this, it is often a good idea to look at some simple base cases.

Assume that u and v are either the empty string () or one character (‘a’, ‘b’, or ‘c’, to limit it).

We have some cases here.

  1. If we have that u = ” (the empty string) and v = ‘a’, then we need to insert ‘a’ into u to get v.
  2. If we have that v=” and u=’b’, then we need to remove a character from u to get v.
  3. Then, if we have that u=’a’ and v=’a’, then do not need to do anything, as they are equal.
  4. Finally, if we have that u=’a’ and v=’b’, then we can use the change operations to turn u into v.

Using that logic, can we turn that into a recursive function, which takes the last character of the sting with the above operations.

There is a catch, of course, we need to consider case 4. In that case, we should try all three options and use the best one.

Let’s try to write that as Python code.

def naive_edit_distance(u, v):
    if len(u) == 0:
        return len(v) # Number of insert operations to fill out.
    if len(v) == 0:
        return len(u) # Number of delete operations to remove chars
    if u[-1] == v[-1]: # Characters are identical, no operations
        return naive_edit_distance(u[:-1], v[:-1])
    return min(naive_edit_distance(u, v[:-1]) + 1, # insert operation
               naive_edit_distance(u[:-1], v) + 1, # delete operation
               naive_edit_distance(u[:-1], v[:-1]) + 1) # change operation

Now, let us first see how this works. Assume we call with the following arguement

naive_edit_distance("ab", "bc")

Then let’s make a call tree to see how that would run with the above code.

Now the result will be propagated all the way up, where each of the leaves will take the minimum value.

This will result in 2 operations is optimal. Also, notice that the operations are not unique. We see two paths given the same number of operations.

Step 2: Solving the Edith Distance problem with dynamic programming

If you investigate the above call tree, then you will notice, that some calls are identical. E.g., the call in with u=’a’ and v=’b’ occurs twice, and we calculate the same 6 consecutive calls below.

With dynamic programming you can avoid calculating the same calls again and again.

The easiest way to turn the above algorithm into a dynamic algorithm, is by storing the value of each call in a global value. Then for each calculated value, store it, and before making a new calculation, check if it exists already.

This turns into the following code.

# A global dictionary to store already called values
results = {}

def get_distance_dynamic_programming(u, v):
    global results
    if (u, v) in results: # Check if call already is calculated
        return results[(u, v)]
    if len(u) == 0: 
        results[(u, v)] = len(v) # Number of insert operations to fill out.
    elif len(v) == 0:
        results[(u, v)] = len(u) # Number of delete operations to remove chars
    elif u[-1] == v[-1]: # Characters are identical, no operations
        results[(u, v)] = get_distance_dynamic_programming(u[:-1], v[:-1])
    else:
        results[(u, v)] = min(get_distance_dynamic_programming(u, v[:-1]) + 1, # insert operation
                              get_distance_dynamic_programming(u[:-1], v) + 1, # delete operation
                              get_distance_dynamic_programming(u[:-1], v[:-1]) + 1) # change operation
    return results[(u, v)]

Hence, the main difference is that we have a global variable, called results, we check in the beginning if the call value (the arguments, u and v) exists, if so return the value.

Else, we do the same, except that we store the value in results, before returning it.

Step 3: Return the operations used to transform the string

Imagine you need to return the operations you used to transform string u to v.

This can be done by the following code.

# Solution which returns the operations
operations = {}

def get_distance_operations(u, v):
    global operations
    if (u, v) in operations:
        return operations[(u, v)]
    if len(u) == 0:
        operations[(u, v)] = (len(v), [("insert", c) for c in v])
    elif len(v) == 0:
        operations[(u, v)] = (len(u), [("remove", c) for c in u])
    elif u[-1] == v[-1]:
        operations[(u, v)] = get_distance_operations(u[:-1], v[:-1])
    else:
        val1, ops1 = get_distance_operations(u, v[:-1])
        val2, ops2 = get_distance_operations(u[:-1], v)
        val3, ops3 = get_distance_operations(u[:-1], v[:-1])
        if val1 <= val2 and val1 <= val3:
            operations[(u, v)] = (val1 + 1, ops1 + [("insert", v[-1])])
        elif val2 <= val1 and val2 <= val3:
            operations[(u, v)] = (val2 + 1, ops2 + [("remove", u[-1])])
        else:
            operations[(u, v)] = (val3 + 1, ops3 + [("change", u[-1] + "->" + v[-1])])
    return operations[(u, v)]

res = get_distance_operations("abbabc", "ababac")
print(res)

The above code would return the following output.

(2, [('remove', 'b'), ('insert', 'a')])

That is amazing. Another cool problem solved.

The Greedy Approximation Algorithm for the Knapsack Problem

What will we cover in this tutorial?

The Knapsack Problem is defined as follows.

Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

https://en.wikipedia.org/wiki/Knapsack_problem

In this tutorial we will give an efficient algorithm, the greedy approximation algorithm, which gives at least 1/2 of the optimal solution.

Step 1: Understand the Knapsack Problem

Imagine you have 3 items. Item 1 has weight 2 kg with value $20. The next item, item 2, has weight 1 kg with value $15. Finally, item 3 has weight 3 kg, but value $35.

The items – notice the inconsistency of the weight in kg and value in USD$

The challenge is that you can only bring 4 kg. How to get the most value with you?

You can take item 1 and item 2, then you have 3 kg and for $45 of value. You cannot take item 3, as that will add up to 6 kg, which is too much.

On the other hand, you could take item 2 and item 3, then you have 4 kg in total and for $50.

The Knapsack Problem is to determine the most valuable items to take within some bound of weight.

Now this seems easy to solve, or does it? Actually, the Knapsack Problem is NP-complete, which means it is difficult to solve.

Step 2: A greedy approximation algorithm to solve it efficiently

To get the optimal solution is difficult, but there exists efficient approximations that are easy to implement.

Let’s take a look at the greedy approximation algorithm.

def greedy_algorithm(w, v, W):
    ordered = []
    for vi, wi in zip(v, w):
        ordered.append((vi/wi, vi, wi))
    ordered.sort(reverse=True)
    X = []
    weight = 0
    val = 0
    for _, vi, wi in ordered:
        if weight + wi <= W:
            X.append((vi, wi))
            weight += wi
            val += vi
    return X, val

The above algorithm is not final. But let’s start by understanding what it does.

First it orders the items based on the highest value per weight. Then it simply adds the items along in that order.

Hence if we have the items from step 1.

Items from Step 1

Then we would have them ordered, Item 2 (15 $/kg), Item 3 (11.67 $/kg), and Item 1 (10 $/kg). Hence, the algorithm would add Item 2 first, as it gives the most value per kg. Then it adds Item 3, as this is the item left, which adds the most value per kg. Then there is no more to add.

Why is this optimal? Well you add the items in the order of what adds the most value.

Step 3: Modifying the greedy approximation to give at least 1/2 of the optimal solution

In the example from Step 2, it all turned out fine.

But we have a problem.

Imagine the items. Item 1′ of value $15 and 1 kg. Then Item 2′ of value $40 and 4 kg. We still assume we can bring 4 kg.

The above algorithm would order it by Item 1′ (15 $/kg) and Item 2′ (10 $/kg).

That would only result in taking Item 1′ of 1 kg and value $20. Because the next item, Item 2′, weighs too much to bring along.

The optimal solution is taking Item 2′ of $40 of value.

The algorithm from previous step can be modified a bit to ensure at least 1/2 of value of the optimal solution. Notice, that the above example does not even give that.

Let’s examine the following modification.

def modified_greedy_algorithm(w, v, W):
    ordered = []
    for vi, wi in zip(v, w):
        ordered.append((vi/wi, vi, wi))
    ordered.sort(reverse=True)
    S1 = []
    weight = 0
    val = 0
    for _, vi, wi in ordered:
        if weight + wi <= W:
            S1.append((vi, wi))
            weight += wi
            val += vi
        else:
            S2 = [(vi, wi)]
            val2 = vi
            if val > val2:
                return S1, val
            else:
                return S2, val2

It is almost identical. Except that when it reaches the first item, which does not fit in, it checks if that item alone gives higher value than the already added items.

Why does that give at least the 1/2 of the optimal solution.

Well, if we could take partial items, then the optimal value would be given by the following. Let k be the first item that does not fit in. Then items 1, 2, …, k – 1, and a portion of item k be the optimal solution. Let that portion be given by alpha. Notice, that alpha is great or equal to 0 end less and equal to 1.

Hence an upper bound of the optimal value is given by v_1 + v_2 + … + v_(k-1) + alpha*v_k = V_max_bound.

Hence, we have that items 1, 2, …, k – 1 or item k alone, must be at least half of V_max_bound. Hence, it shows that the above algorithm gives at least half of the optimal solution.

How-To Bracket all Possible Evaluations of a Binary Operator for an Arbitrary Input Length

What will we cover in this tutorial?

Consider addition of the following expression. 5 + 7 + 3. As addition is associative, the order of evaluation does not matter. Hence, we have that (5 + 7) + 3 = 5 + (7 + 3).

For an arbitrary binary operator op(x_1, x_2) -> y, you cannot assume that op(x_1, op(x_2, x_3)) = op(op(x_1, x_2), x_3). Hence, the order of evaluation of a binary operator matters.

The brackets give the order of how the expression is evaluated. Hence, if an operator takes three different inputs, a, b, or c, then we can write a(ab), to symbol it evaluated ab first, then the result of ab with a.

This tutorial will show you how to get all possible evaluations of an arbitrary length input. That is, given an input, e.g., abab, how to find a list of all possible evaluations of the input, [(a(b(ab)), (a((ba)b), (ab)(ab), ((a(ba))b), (((ab)a)b)].

Consider that for a moment. How do you program that in Python?

Step 1: Define the problem

For simplicity we assume we have a binary operator op: [a, b, c] x [a, b, c] -> [a, b, c]. The operator can take arguments op(a, b) and evaluate it and give, say, c.

To make that notation more efficient, we will write op(ab) = c. If we make an evaluation table of the operator, say, we will have the following.

opabc
accb
bacb
cbaa
Evaluation of the operator op.

Hence, we have that op(aa) = c, op(ab) = c, …, op(cc) = a.

Now notice, that this operator is not associative or commutative.

It is not commutative, as op(ab) ≠ op(ba).

And it is not associative, as op(a op(ac)) = op(ab) = c ≠ op(op(aa) c) = op(cc) = a.

Now we understand why a binary operator needs an order of evaluation. Because, the final output will depend on it.

What we want now, is given an input, e.g., aac, how many ways can you set brackets to get a different evaluation of a binary operator, [(a(ac), ((aa)c)].

Step 2: Breaking the problem down

Now we understand why the problem is important. But how do we solve it?

Breaking it down and think like a programmer.

The base case is a string of length 2, e.g., aa, which only has one possible way (aa).

Then we have the case of length 3, e.g. aac, which can be broken down into two ways ((aa)c), (a(ac). Another way to think of it, is you can break a string of length 3, into a string of length 2 + 1 or 1 + 2.

Then the case of length 4, say, aaab, that can be broken down into (((aa)a)b), ((a(aa))b), ((aa)(ab)), (a((aa)b)), (a(a(ab))). Or you could think of it like, 3 + 1, 2 + 2, 1 + 3. Wait, what does that mean?

A drawing might come in handy now.

A visualization of the way to break down the bracket problem

See, a string of length 4 can be broken down into an instance of length 1 (as the first argument to the operator) and length 3, an instance of length 2 and 2 (each as input to the operator), or of length 3 and 1 (each as input to the operator).

Hence, you break the problem recursively down.

For a string of length 5 you have the following diagram.

A string of length 5

Amazing. Now we just need to turn that into code.

Step 3: How to create a list of all possible brackets for a binary operator

Consider the diagram from last step. Then think about recursion. How do they all add up together?

Recursion, you need a base case. It could be for a string of length 2, but let’s do it for length 1 instead, as you see you want to call for length 1.

def get_all_variations(x):
    if len(x) == 1:
        return [x]
    solutions = []
    for i in range(1, len(x)):
        x1 = x[:i]
        x2 = x[i:]
        res1 = get_all_variations(x1)
        res2 = get_all_variations(x2)
        for r1 in res1:
            for r2 in res2:
                solutions.append('(' + r1 + r2 + ')')
    return solutions

res = get_all_variations('aaabc')
for r in res:
    print(r)

This shows how to create it simple with an example. The function has the base case, where it just returns the element in a string. In the general case, it breaks the string down into two non-zero length string. Then it calls recursively.

For each of the results, it appends all possible solutions together in all possible ways.

The above code gives the following output.

(a(a(a(bc))))
(a(a((ab)c)))
(a((aa)(bc)))
(a((a(ab))c))
(a(((aa)b)c))
((aa)(a(bc)))
((aa)((ab)c))
((a(aa))(bc))
(((aa)a)(bc))
((a(a(ab)))c)
((a((aa)b))c)
(((aa)(ab))c)
(((a(aa))b)c)
((((aa)a)b)c)

Conclusion

This is an amazing example of how to solve a computer science problem. The method is a general way to break down into a recursive function. Learning the skills in this tutorial is crucial to become a good problem solver in computer science.