How-To Solve the Edit Distance Problem with Dynamic Programming

What will we cover in this tutorial?

In this tutorial we will make a dynamic programming solution to the Edit Distance Problem.

Consider the following problem. Given a string u = “abbabc” and v = “ababac”, and the operations to insert, remove or change a character in string u. What is the minimum amount of operations you need to transform u into v.

  1. Remove b: u = “abbabc”remove b -> u’ = “ababc”.
  2. Insert a: u’ = “ababc” – insert a -> u” = “ababac”.

Hence, with two operations we transform u into v.

How can you make a program that solves the general problem for two arbitrary strings, u and v, with minimal operations? And even better, how can we solve it with dynamic programming and how does that help us?

Step 1: Solve the Edit Distance Problem without Dynamic Programming

It is a good idea to solve a problem with a normal approach first. This makes you understand the problem at heart.

To get started with solving a problem like this, it is often a good idea to look at some simple base cases.

Assume that u and v are either the empty string () or one character (‘a’, ‘b’, or ‘c’, to limit it).

We have some cases here.

  1. If we have that u = ” (the empty string) and v = ‘a’, then we need to insert ‘a’ into u to get v.
  2. If we have that v=” and u=’b’, then we need to remove a character from u to get v.
  3. Then, if we have that u=’a’ and v=’a’, then do not need to do anything, as they are equal.
  4. Finally, if we have that u=’a’ and v=’b’, then we can use the change operations to turn u into v.

Using that logic, can we turn that into a recursive function, which takes the last character of the sting with the above operations.

There is a catch, of course, we need to consider case 4. In that case, we should try all three options and use the best one.

Let’s try to write that as Python code.

def naive_edit_distance(u, v):
    if len(u) == 0:
        return len(v) # Number of insert operations to fill out.
    if len(v) == 0:
        return len(u) # Number of delete operations to remove chars
    if u[-1] == v[-1]: # Characters are identical, no operations
        return naive_edit_distance(u[:-1], v[:-1])
    return min(naive_edit_distance(u, v[:-1]) + 1, # insert operation
               naive_edit_distance(u[:-1], v) + 1, # delete operation
               naive_edit_distance(u[:-1], v[:-1]) + 1) # change operation

Now, let us first see how this works. Assume we call with the following arguement

naive_edit_distance("ab", "bc")

Then let’s make a call tree to see how that would run with the above code.

Now the result will be propagated all the way up, where each of the leaves will take the minimum value.

This will result in 2 operations is optimal. Also, notice that the operations are not unique. We see two paths given the same number of operations.

Step 2: Solving the Edith Distance problem with dynamic programming

If you investigate the above call tree, then you will notice, that some calls are identical. E.g., the call in with u=’a’ and v=’b’ occurs twice, and we calculate the same 6 consecutive calls below.

With dynamic programming you can avoid calculating the same calls again and again.

The easiest way to turn the above algorithm into a dynamic algorithm, is by storing the value of each call in a global value. Then for each calculated value, store it, and before making a new calculation, check if it exists already.

This turns into the following code.

# A global dictionary to store already called values
results = {}

def get_distance_dynamic_programming(u, v):
    global results
    if (u, v) in results: # Check if call already is calculated
        return results[(u, v)]
    if len(u) == 0: 
        results[(u, v)] = len(v) # Number of insert operations to fill out.
    elif len(v) == 0:
        results[(u, v)] = len(u) # Number of delete operations to remove chars
    elif u[-1] == v[-1]: # Characters are identical, no operations
        results[(u, v)] = get_distance_dynamic_programming(u[:-1], v[:-1])
    else:
        results[(u, v)] = min(get_distance_dynamic_programming(u, v[:-1]) + 1, # insert operation
                              get_distance_dynamic_programming(u[:-1], v) + 1, # delete operation
                              get_distance_dynamic_programming(u[:-1], v[:-1]) + 1) # change operation
    return results[(u, v)]

Hence, the main difference is that we have a global variable, called results, we check in the beginning if the call value (the arguments, u and v) exists, if so return the value.

Else, we do the same, except that we store the value in results, before returning it.

Step 3: Return the operations used to transform the string

Imagine you need to return the operations you used to transform string u to v.

This can be done by the following code.

# Solution which returns the operations
operations = {}

def get_distance_operations(u, v):
    global operations
    if (u, v) in operations:
        return operations[(u, v)]
    if len(u) == 0:
        operations[(u, v)] = (len(v), [("insert", c) for c in v])
    elif len(v) == 0:
        operations[(u, v)] = (len(u), [("remove", c) for c in u])
    elif u[-1] == v[-1]:
        operations[(u, v)] = get_distance_operations(u[:-1], v[:-1])
    else:
        val1, ops1 = get_distance_operations(u, v[:-1])
        val2, ops2 = get_distance_operations(u[:-1], v)
        val3, ops3 = get_distance_operations(u[:-1], v[:-1])
        if val1 <= val2 and val1 <= val3:
            operations[(u, v)] = (val1 + 1, ops1 + [("insert", v[-1])])
        elif val2 <= val1 and val2 <= val3:
            operations[(u, v)] = (val2 + 1, ops2 + [("remove", u[-1])])
        else:
            operations[(u, v)] = (val3 + 1, ops3 + [("change", u[-1] + "->" + v[-1])])
    return operations[(u, v)]

res = get_distance_operations("abbabc", "ababac")
print(res)

The above code would return the following output.

(2, [('remove', 'b'), ('insert', 'a')])

That is amazing. Another cool problem solved.

Leave a Reply

%d bloggers like this: