Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    How-To Solve the Edit Distance Problem with Dynamic Programming

    What will we cover in this tutorial?

    In this tutorial we will make a dynamic programming solution to the Edit Distance Problem.

    Consider the following problem. Given a string u = “abbabc” and v = “ababac”, and the operations to insert, remove or change a character in string u. What is the minimum amount of operations you need to transform u into v.

    1. Remove b: u = “abbabc”remove b -> u’ = “ababc”.
    2. Insert a: u’ = “ababc” – insert a -> u” = “ababac”.

    Hence, with two operations we transform u into v.

    How can you make a program that solves the general problem for two arbitrary strings, u and v, with minimal operations? And even better, how can we solve it with dynamic programming and how does that help us?

    Step 1: Solve the Edit Distance Problem without Dynamic Programming

    It is a good idea to solve a problem with a normal approach first. This makes you understand the problem at heart.

    To get started with solving a problem like this, it is often a good idea to look at some simple base cases.

    Assume that u and v are either the empty string () or one character (‘a’, ‘b’, or ‘c’, to limit it).

    We have some cases here.

    1. If we have that u = ” (the empty string) and v = ‘a’, then we need to insert ‘a’ into u to get v.
    2. If we have that v=” and u=’b’, then we need to remove a character from u to get v.
    3. Then, if we have that u=’a’ and v=’a’, then do not need to do anything, as they are equal.
    4. Finally, if we have that u=’a’ and v=’b’, then we can use the change operations to turn u into v.

    Using that logic, can we turn that into a recursive function, which takes the last character of the sting with the above operations.

    There is a catch, of course, we need to consider case 4. In that case, we should try all three options and use the best one.

    Let’s try to write that as Python code.

    def naive_edit_distance(u, v):
        if len(u) == 0:
            return len(v) # Number of insert operations to fill out.
        if len(v) == 0:
            return len(u) # Number of delete operations to remove chars
        if u[-1] == v[-1]: # Characters are identical, no operations
            return naive_edit_distance(u[:-1], v[:-1])
        return min(naive_edit_distance(u, v[:-1]) + 1, # insert operation
                   naive_edit_distance(u[:-1], v) + 1, # delete operation
                   naive_edit_distance(u[:-1], v[:-1]) + 1) # change operation
    

    Now, let us first see how this works. Assume we call with the following arguement

    naive_edit_distance("ab", "bc")
    

    Then let’s make a call tree to see how that would run with the above code.

    Now the result will be propagated all the way up, where each of the leaves will take the minimum value.

    This will result in 2 operations is optimal. Also, notice that the operations are not unique. We see two paths given the same number of operations.

    Step 2: Solving the Edith Distance problem with dynamic programming

    If you investigate the above call tree, then you will notice, that some calls are identical. E.g., the call in with u=’a’ and v=’b’ occurs twice, and we calculate the same 6 consecutive calls below.

    With dynamic programming you can avoid calculating the same calls again and again.

    The easiest way to turn the above algorithm into a dynamic algorithm, is by storing the value of each call in a global value. Then for each calculated value, store it, and before making a new calculation, check if it exists already.

    This turns into the following code.

    # A global dictionary to store already called values
    results = {}
    
    def get_distance_dynamic_programming(u, v):
        global results
        if (u, v) in results: # Check if call already is calculated
            return results[(u, v)]
        if len(u) == 0: 
            results[(u, v)] = len(v) # Number of insert operations to fill out.
        elif len(v) == 0:
            results[(u, v)] = len(u) # Number of delete operations to remove chars
        elif u[-1] == v[-1]: # Characters are identical, no operations
            results[(u, v)] = get_distance_dynamic_programming(u[:-1], v[:-1])
        else:
            results[(u, v)] = min(get_distance_dynamic_programming(u, v[:-1]) + 1, # insert operation
                                  get_distance_dynamic_programming(u[:-1], v) + 1, # delete operation
                                  get_distance_dynamic_programming(u[:-1], v[:-1]) + 1) # change operation
        return results[(u, v)]
    

    Hence, the main difference is that we have a global variable, called results, we check in the beginning if the call value (the arguments, u and v) exists, if so return the value.

    Else, we do the same, except that we store the value in results, before returning it.

    Step 3: Return the operations used to transform the string

    Imagine you need to return the operations you used to transform string u to v.

    This can be done by the following code.

    # Solution which returns the operations
    operations = {}
    
    def get_distance_operations(u, v):
        global operations
        if (u, v) in operations:
            return operations[(u, v)]
        if len(u) == 0:
            operations[(u, v)] = (len(v), [("insert", c) for c in v])
        elif len(v) == 0:
            operations[(u, v)] = (len(u), [("remove", c) for c in u])
        elif u[-1] == v[-1]:
            operations[(u, v)] = get_distance_operations(u[:-1], v[:-1])
        else:
            val1, ops1 = get_distance_operations(u, v[:-1])
            val2, ops2 = get_distance_operations(u[:-1], v)
            val3, ops3 = get_distance_operations(u[:-1], v[:-1])
            if val1 <= val2 and val1 <= val3:
                operations[(u, v)] = (val1 + 1, ops1 + [("insert", v[-1])])
            elif val2 <= val1 and val2 <= val3:
                operations[(u, v)] = (val2 + 1, ops2 + [("remove", u[-1])])
            else:
                operations[(u, v)] = (val3 + 1, ops3 + [("change", u[-1] + "->" + v[-1])])
        return operations[(u, v)]
    
    res = get_distance_operations("abbabc", "ababac")
    print(res)
    

    The above code would return the following output.

    (2, [('remove', 'b'), ('insert', 'a')])
    

    That is amazing. Another cool problem solved.

    Python Circle

    Do you know what the 5 key success factors every programmer must have?

    How is it possible that some people become programmer so fast?

    While others struggle for years and still fail.

    Not only do they learn python 10 times faster they solve complex problems with ease.

    What separates them from the rest?

    I identified these 5 success factors that every programmer must have to succeed:

    1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
    2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
    3. Support: receive feedback on your work and ask questions without feeling intimidated or judged.
    4. Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
    5. Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.

    I know how important these success factors are for growth and progress in mastering Python.

    That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.

    With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

    Python Circle
    Python Circle

    Be part of something bigger and join the Python Circle community.

    Leave a Comment