How to Export Pandas DataFrame to Excel and Create a Trendline Graph of Scatter Plot

What will we cover in this tutorial?

We will have some data in a Pandas DataFrame, which we want to export to an Excel sheet. Then we want to create a Scatter plot graph and fit that to a Excel trendline.

Step 1: Get the data

You might have some data already that you want to use. It can be from a HTML page (example) or CSV file.

For this purpose here we just generate some random data to use. We will use NumPy’s uniform function to generate it.

import pandas as pd
import numpy as np


# Generate some random increasing data
data = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)
print(data)

Which will generate some slightly increasing data, which is nice to fit a graph to.

The output could look something like this.

            A          B
0    0.039515   0.778077
1    0.451888   0.210705
2    0.992493   0.961428
3    0.317536   1.046444
4    1.220419   1.388086

Step 2: Create an Excel XlsxWriter engine

This step might require that you install the XlsxWriter library, which is needed from the Pandas library.

This can be done by the following command.

pip install xlsxwriter

Now we can create the engine in our code.

import pandas as pd
import numpy as np


# Generate some random increasing data
data = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)

# Create a Pandas Excel writer using XlsxWriter
excel_file = 'output.xlsx'
sheet_name = 'Data set'
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')

This will setup a Excel writer engine and be ready to write to file output.xlsx.

Step 3: Write the data to Excel and create a scatter graph with a fitted Trendline

This can be done by the following code, which uses the add_series function to insert a graph.

import pandas as pd
import numpy as np


# Generate some random increasing data
data = pd.DataFrame(
    {'A': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)],
     'B': [np.random.uniform(0.1*i, 0.1*i + 1) for i in range(100)]}
)

# Create a Pandas Excel writer using XlsxWriter
excel_file = 'output.xlsx'
sheet_name = 'Data set'
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')
data.to_excel(writer, sheet_name=sheet_name)

# Access the XlsxWriter workbook and worksheet objects from the dataframe.
workbook = writer.book
worksheet = writer.sheets[sheet_name]

# Create a scatter chart object.
chart = workbook.add_chart({'type': 'scatter'})

# Get the number of rows and column index
max_row = len(data)
col_x = data.columns.get_loc('A') + 1
col_y = data.columns.get_loc('B') + 1

# Create the scatter plot, use a trendline to fit it
chart.add_series({
    'name':       "Samples",
    'categories': [sheet_name, 1, col_x, max_row, col_x],
    'values':     [sheet_name, 1, col_y, max_row, col_y],
    'marker':     {'type': 'circle', 'size': 4},
    'trendline': {'type': 'linear'},
})

# Set name on axis
chart.set_x_axis({'name': 'Concentration'})
chart.set_y_axis({'name': 'Measured',
                  'major_gridlines': {'visible': False}})

# Insert the chart into the worksheet in field D2
worksheet.insert_chart('D2', chart)

# Close and save the Excel file
writer.save()

Result

The result should be similar to this.

The resulting Excel sheet.

That is how it can be done.

How To Solve Tower of Hanoi with Recursion

What will we cover in this tutorial?

We will first explain and understand Tower of Hanoi programming challenge. It is one of those programming challenges that are highly liked among programmers.

  • Understanding the problem is easy.
  • Solving it seems difficult.
  • Using recursion makes it easy and elegant to solve.

You should be one of those that masters the Tower of Hanoi.

Step 1: Understand the Tower of Hanoi challenge

Tower of Hanoi is a mathematical game, which has three rules. Before we set the rules, let’s see how our universe looks like.

A basic setup of Tower of Hanoi with 3 disks and 3 towers (often called rods)

The disk all have different sizes as pictured above.

The goal is to move all the disks from on tower (rod) to another one with the following 3 rules.

  1. You can only move one disk at the time.
  2. You can only take the top disk and place on top of another tower (rod).
  3. You cannot place a bigger disk on top of a smaller disk.

The first two rules combined means that you can only take one top disk and move it.

Say, in the above we have moved the disk 1 from the first to the second tower (rod).

After that move, we can move disk 2 or disk 1.

The third rule says, that we cannot move disk 2 on top of disk 1.

Those are the 3 rules of the game.

Now do yourself a favor and try to think how you would solve that. How do you get from here.

To here following the 3 rules.

Step 2: Recall recursion and unleash the power of it

Recursion is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem.

https://en.wikipedia.org/wiki/Recursion_(computer_science)

While that is a beautiful and perfect definition – there is still need to exemplify what that means.

A simple example is to sum up the numbers from 1 to n.

It can be a bit difficult to connect the definition of recursions to getting the sum of the integers 1 + 2+ 3 + … + (n – 1) + n.

Let’s first try to do in the iterative way.

def summation(n):
    sum = 0
    for i in range(1, n + 1):
        sum += i
    return sum

print(summation(10))

While there is thing wrong with the above solution, you can turn that into a recursive function by the following.

def summation(n):
    if n == 0:
        return 0
    return n + summation(n - 1)

print(summation(10))

As you see, the problem of summing the numbers from 1 to n, is actually reversed to sum the numbers n, n – 1, n – 2, …, 1. You also notice, that it is true that the summation(n) is equal to n + summation(n – 1).

Wow, that was it. We are breaking the problem down to a problem of smaller size. Just like the definition said.

Also, notice the importance to have a base case in the function. In the above we choose for n == 0 to return 0. This ensures that the recursive calls to not continue forever (or when the Python interpreters stops due to maximum recursion depth).

Try it.

def summation(n):
    return n + summation(n - 1)

print(summation(10))

Well, what did we gain from making the function recursive?

Good question. The above might not be a good example of how recursion helps you. The example of Tower of Hanoi will show you the benefit. It will make your code easy and straight forward.

Step 3: Implement Tower of Hanoi with a recursive function

Now we need to think recursive. Consider the problem again.

How can we break that down to a smaller problem?

Think backwards. Just like the summation from Step 2. What do we need to make happen if we should move disk 3 from first tower (rod) to the last tower (rod)?

Exactly. Then we can move disk 3 to the final destination.

And after that, we should move the smaller problem of the 2 disks on top fo disk 3.

Wow. That is the formula. It is all you need to know.

  1. Move the smaller problem of 2 disks from first tower (rod) to second tower (rod).
  2. Move the big disk from first tower (rod) to last tower (rod).
  3. Move the smaller problem of 2 disks from second tower (rod) to last tower (rod).

Now we need to generalize that.

First understand that there can be any number of disks in an instance of Tower of Hanoi. This means that the problem starts with n disks. If we number the towers (rods) 0, 1, and 2.

Then we have that all n disks start on tower (rod) 0 and should end in tower (rod) 2.

Now we can break down the problem to the following. Given n disks that needs to be moved from start_tower to dest_tower (destination), using aux_tower (auxiliary).

  1. Move subproblem of n – 1 disks from start_tower to aux_tower.
  2. Move disk n to dest_tower.
  3. Move subproblem of n – 1 disk from aux_tower to dest_tower.

See the code below.

class Towers:
    def __init__(self, disks=3):
        self.disks = disks
        self.towers = [[]]*3
        self.towers[0] = [i for i in range(self.disks, 0, -1)]
        self.towers[1] = []
        self.towers[2] = []

    def __str__(self):
        output = ""
        for i in range(self.disks, -1, -1):
            for j in range(3):
                if len(self.towers[j]) > i:
                    output += " " + str(self.towers[j][i])
                else:
                    output += "  "
            output += "\n"

        return output + "-------"

    def move(self, from_tower, dest_tower):
        disk = self.towers[from_tower].pop()
        self.towers[dest_tower].append(disk)


def solve_tower_of_hanoi(towers, n, start_tower, dest_tower, aux_tower):
    # Base case - do nothing
    if n == 0:
        return

    # Move subproblem of n - 1 disks from start_tower to aux_tower.
    solve_tower_of_hanoi(towers, n - 1, start_tower, aux_tower, dest_tower)

    # Move disk n to dest_tower.
    towers.move(start_tower, dest_tower)
    print(towers)

    # Move subproblem of n - 1 disk from aux_tower to dest_tower.
    solve_tower_of_hanoi(towers, n - 1, aux_tower, dest_tower, start_tower)


t = Towers()
print(t)
solve_tower_of_hanoi(t, len(t.towers), 0, 2, 1)

The code includes a simple print function to see the trace of the movings.

Too fast?

Now this often seems a bit too fast. Didn’t we leave out all the subproblems?

That is the beauty of it. We actually didn’t

You tell the machine how to solve the problem using a smaller instance of the problem. This was done with the three things we did. First, move the subproblem away. Second, move the the biggest disk. Finally, move the subproblem on top of biggest disk.

How does that solve it all. See, you solve it for general n, hence, the smaller subproblems solves it by the same formula and we ensure the base case when there are no disks to move.

Master Selection Sort in Python in 3 Steps

What will we cover in this tutorial

Selection sort is one of the simplest sorting algorithms, which is a good algorithm to start with. While the algorithm is considered to be slow, it has the advantage of not using auxiliary space.

Step 1: Understand the Selection Sort algorithm

The goal of sorting is to take an unsorted array of integers and sort it.

Example given below.

[97, 29, 53, 92, 42, 36, 12, 57, 90, 76, 85, 81, 12, 61, 45, 3, 83, 34, 7, 48]

to

[3, 7, 12, 12, 29, 34, 36, 42, 45, 48, 53, 57, 61, 76, 81, 83, 85, 90, 92, 97]

The algorithm is the most intuitive way of sorting a list.

It works as follows.

  1. Go through the list to be sorted and find the smallest element.
  2. Switch the smallest element with the first position.

If you started with the following list.

[97, 29, 53, 92, 42, 36, 12, 57, 90, 76, 85, 81, 12, 61, 45, 3, 83, 34, 7, 48]

You would now have this list.

[3, 29, 53, 92, 42, 36, 12, 57, 90, 76, 85, 81, 12, 61, 45, 97, 83, 34, 7, 48]

Notice, that now we have the smallest element in the front of the list, we know that the second smallest element must be somewhere in the list starting from the second position all the way to the end.

Hence, you can repeat step the above 2 steps on the list excluding the first element.

This will give you the following list.

[3, 7, 53, 92, 42, 36, 12, 57, 90, 76, 85, 81, 12, 61, 45, 97, 83, 34, 29, 48]

Now we have that the first two elements are sorted, while the rest of the list is not sorted.

Hence, we can repeat the two steps again on the unsorted part of the list.

If we continue this until the we reach the end of the list. This should give us a sorted list.

Step 2: Implementation of Selection Sort

A beautiful thing about Selection Sort is that it does not use any auxiliary memory. If you are new to sorting, then this can be a big advantage if sorting large data sets.

The disadvantage of Selection Sort is the time complexity.

We will come back to that later.

The code of Selection Sort can be done in the following manner.

def selection_sort(list_to_sort):
    for i in range(len(list_to_sort)):
        index_of_min_value = i
        for j in range(i + 1, len(list_to_sort)):
            if list_to_sort[j] < list_to_sort[index_of_min_value]:
                index_of_min_value = j

        list_to_sort[i], list_to_sort[index_of_min_value] = list_to_sort[index_of_min_value], list_to_sort[i]


list_to_sort = [97, 29, 53, 92, 42, 36, 12, 57, 90, 76, 85, 81, 12, 61, 45, 3, 83, 34, 7, 48]
selection_sort(list_to_sort)
print(list_to_sort)

This will produce the correct output.

[3, 7, 12, 12, 29, 34, 36, 42, 45, 48, 53, 57, 61, 76, 81, 83, 85, 90, 92, 97]

Step 3: The time complexity of Selection Sort algorithm

Now this is the sad part of this simple algorithm. It does not perform good. A sorting algorithm is considered efficient if it runs in O(n log(n)), which Selection Sort does not.

The simple time complexity analysis is as follows.

Assume we have a list of n unsorted integers. Then the first iteration of the list will make n – 1 comparisons, the second iteration will make n – 2 comparisons, and so forth all the way down to 1 comparison.

This is the sum of 1 to n – 1, which is found by this formula (n – 1)(n – 2)/2, which is O(n^2).

Other than that the algorithm does n swapping of numbers. This is O(n).

This combines the algorithm to O(n + n^2) = O(n^2).

Next Step

This should wake your appetite to understand how you can make more efficient sorting.

Another good example of a simple sorting algorithm is the Insertion Sort algorithm.

For more efficient algorithm you should check out the Merge Sort algorithm.

If you want to be serious about sorting, check out my online course on the subject.

Why use Exceptions? How-to use Exceptions in the correct way!

What will we cover in this tutorial?

The key to use exceptions correct is to understand why we use them.

The best way to understand why to use exceptions is to see what happens when we do not use exceptions in our code.

After that exploration we will show how it can solve the examples with exceptions.

Step 1: In the perfect world!

Remember those times before the exception was invented?

No? Well, of course not. It is long time ago, in a distant past in a programming language you probably have not heard about.

As an aspiring programmer, you have probably seen and heard about exceptions, but never put them to use yourself.

The best way is to create examples that can be greatly simplified with exceptions.

Let’s keep the world simple.

def add(a, b):
    return a + b


print(add(2, 3))

This would print out the result 5.

As long as the world is clean and everybody uses the function correct, there is nothing to worry about.

But what if someone calls the function with wrong types of arguments. Then the function does not do as expected.

def add(a, b):
    return a + b


print(add("2", "3"))

This will print 23, which might be a bit surprising if you are not familiar with the function, and possibly how Python uses addition on strings.

So how to handle it.

Step 2: In the real world where people do not use things as intended

As you will realize in your careers of programming, it happens that other programmers do not use your function correctly.

Why would they do that?

Maybe they don’t take the time to read the good documentation you wrote. Or maybe they are just careless and working under hard time pressure.

Even more funny, it could be you. It could be code you wrote a year ago (or less), and the documentation was not that good as you thought, hence, you use your own function incorrectly.

To continue the simple example. One way to handle wrong input without exceptions could be as follows.

def add(a, b):
    if type(a) is not int or type(b) is not int:
        return None
    return a + b


result = add("2", "3")
if result is not None:
    print(result)
else:
    print("Invalid input format to function add!")

Oh, no! What happened. Actually, your function is not that difficult to understand. It just starts by checking if the input format is as expected. If not, return an error code. As this is Python, an error code can simply be the value None.

That might seem fine. But what about the user of your function? Now the user needs to validate that the correct format of the result was returned.

This makes the code more complex. Also, the user of your function needs to know more details of how your function handles invalid input, or whatever can go wrong in your function.

The problem is, that you pass on a problem to someone else, which needs to know about the details of how your function works.

This is the opposite of decoupling. And decoupling is considered good. Easier to program, easier to change, easier to maintain, to mention a few benefits.

Step 3: Solve it with exceptions

The core idea of exceptions is to handle the when something out of the ordinary happens from the main logic of a program.

What? Yes, the expected case in our simple function, is that the user provides integers as input. But it might happen they do not call it with integers. That is unexpected, that is something out of the ordinary, and this breaks the logic in how you would build the program.

So how should we do in the above example?

def add(a, b):
    if type(a) is not int or type(b) is not int:
        raise Exception("Input format incorrect")
    return a + b

Actually, your part of the obligations becomes easier.

How?

Because now you do not need to write complex documentation of the error return codes in your program. The documentation is kept automatically in the exception. It will even guide the user to where in the code the exception comes from.

def add(a, b):
    if type(a) is not int or type(b) is not int:
        raise Exception("Input format incorrect")
    return a + b


result = add("2", "3")

This would give the following result.

Traceback (most recent call last):
  File "/Users/admin/PycharmProjects/LearningSpace/test2.py", line 7, in <module>
    result = add("2", "3")
  File "/Users/admin/PycharmProjects/LearningSpace/test2.py", line 3, in add
    raise Exception("Input format incorrect")
Exception: Input format incorrect

Which leads the user of the function to where in the code it went wrong. Then it will be easier for them to figure out what is wrong and how to fix it.

Step 4: Try-catch from the user

It happens that the user of your function would like to handle the special case.

They now master what is going wrong, and want to inform their users of the problem.

def add(a, b):
    if type(a) is not int or type(b) is not int:
        raise Exception("Input format incorrect")
    return a + b


try:
    result = add("2", "3")
except:
    print("The input format is incorrect")

This leads to something you might not notice at first. It leads to a good easy flow in your program. It is easy to read and understand.

It is clear that this is not intended flow in the except-part of the program. Also, the normal flow in the program would be in the try-part.

It makes it easier to understand, which is the ultimate goal of a good programmer. Make your code easy to read for others. That is, including you in less than 6 months (we do forget our own code and the logic in the programs we write faster than we expect).

Understand why Anagram is a Favorite Interview Coding Problem

What will we cover in this tutorial?

Wonder why some of these trivial questions are popular in coding interviews?

The truths is, they have a depth of hidden aspects which uncovers whether you understand the the true nature of the problem.

Often, there is no right and wrong solution. It depends. Can you answer what it depends on?

In this tutorial we will investigate the Anagram Problem.

What they are really looking for is your underlying understanding of algorithmic time complexity (big-O analysis) and your ability to improve on solutions.

This tutorial will show one example of this.

Step 1: What is the Anagram Problem?

The first part of a coding challenge problem is to show that you understand it. Many underestimate this part and do not realize how important that is.

Often the problem statements are a bit vague, but given with some examples. It could be directly taken from wikipedia and ask you to implement it.

An anagram is a word or phrase formed by rearranging the letters of a different word or phrase.

https://en.wikipedia.org/wiki/Anagram

Or as it could be stated in you interview question: Given two strings, str_a and str_b, return whether they are an anagram of each other.

First, you need to understand the problem. Often there will be examples to show you what an anagram is.

An examples could include.

  • “Listen” and “Silent”

Where you notice that the letters in LISTEN are the exact same as in SILENT, just in a different order.

What you also notice is that the lowercase and uppercase of the letters do not matter.

The definition says also phrases, hence, examples of anagrams include.

  • “a gentleman” and “elegant man”
  • “eleven plus two” and “twelve plus one”
  • William Shakespeare” = “I am a weakish speller”

The last example is quite interesting, as it shows that the number of spaces do not seem to matter. The phrase “William Shakespeare” has 1 space, while the phrase “I am a weakish speller” has 4 spaces.

Now your part is to show that you understand it. In this case it would be good to state an input that will return True and one that will return False.

Say, “evil” and “vile” as input returns True, while “evil” and “test” will return False.

If you do not understand it, don’t worry. Imagine that you do not admit it. Then what is the best that can happen? Are you by any chance going to guess it out of the blue? Most likely not.

The best you can do is to admit it and explain as good as you can as much of what you think you understand.

Step 2: Find edge cases and set restrictions on input

A natural next step is to consider edge cases and set restrictions. Most would jump into writing some pseudo code and try to solve it. Don’t be that type of person. Take a deep breath and remember what we said in Step 1.

What did we observe in Step 1?

  1. That uppercase and lowercase is ignored when considering anagrams.
  2. That the number of spaces do not matter when considering anagrams.

Now this is important to notice.

An advise is to simplify as much as possible in a job interview. But you need to tell them that you do simplify the problem.

Hence, you can say.

  1. You assume that input will be all in lowercase.
  2. That input will not have spaces.

Is this reasonable? Well, it simplifies your code a lot. If they want you to consider these cases, they will let you know.

The importance of this step cannot be stressed enough. Because, if you just assumes that you can restrict input and write your code, they might now you have considered theses cases.

On the other hand, if you do set it as restrictions before hand they know you have thought of it.

Step 3: The naive solution

The next step is to make a solution. As you might not come up with the best solution first, a good way is to start with whatever comes to your mind and call it the naive solution.

It is always a good idea to write it down. This will help you get a better idea to solve it more efficient afterwards. Notice, that the restrictions you made in Step 2 comes in handy now, as you can construct an easy solution fast now.

How could a naive solution look like?

def is_anagram_naive(str_a, str_b):
    if len(str_a) != len(str_b):
        return False

    for c in str_a:
        if c not in str_b:
            return False
        else:
            str_b = str_b.replace(c, '', 1)
    return True


print(is_anagram_naive("silent", "listen"))
print(is_anagram_naive("elevenplustwo", "twelveplusone"))
print(is_anagram_naive("anna", "anaa"))

The question is how does it perform?

Well, first the check of the length of the string should be in constant time, O(1). If the underlying interpreter does not have an efficient implementation, then it could be O(n) time, where n is the length of str_a and str_b.

The loop over str_a is O(n). In each loop it will check if the character c is in str_b. This check is implemented to be faster than brute force. In an job interview you are not expected to remember that. It is good enough to say O(n) in first iteration.

Then we notice that it will use n, n-1, n-2, …, 3, 2, 1 operations in total to do this test. This results to n(n-1)/2 operations, which is O(n^2) in total.

On top of that, it will use the same to replace the character c in str_b.

That means the algorithm uses O(n^2) time complexity in worst case.

Step 3: Improving the performance

This is where it gets interesting. Now you showed you can solve the problem. The next step is to show that you can improve it.

How to get an idea?

Good you asked.

It lies in the previous step. What is the time complexity. You need to improve on the performance. What is often better than O(n^2)?

Yes, a natural next improvement would be O(n log(n)). What algorithms do you know? Well, sorting, can that help?

What if we sort the characters and compare them in order, then they must be identical if it is an anagram. Try to sort the characters of any anagram. Example, “silent” becomes [‘e’, ‘i’, ‘l’, ‘n’, ‘s’, ‘t’], and “listen” becomes [‘e’, ‘i’, ‘l’, ‘n’, ‘s’, ‘t’], identical.

This makes sense. As they have the same characters, they must come in the same order if we have them sorted.

Let’s try.

def is_anagram_improved(str_a, str_b):
    if len(str_a) != len(str_b):
        return False
    list_a = list(str_a)
    list_b = list(str_b)
    list_a.sort()
    list_b.sort()
    for l1, l2 in zip(list_a, list_b):
        if l1 != l2:
            return False
    return True

We start by checking if the length of the strings are the same, if not, return False, as it cannot be anagrams.

Then convert the strings to lists. It should take O(n) time. Now for the expensive step. Sorting it. This takes O(n log(n)) time.

Finally we check the characters one by one, in sorted order, if they are equal. If not, we do not have an anagram. This takes O(n) time.

That is, it uses O(n) + O(n log(n)) + O(n) = O(n log(n)) time.

You did the first improvement.

Can you do better? I know, this is annoying. But maybe we can. Let’s try.

Step 4: Improving the code – a common pitfall

Well, you are a shark at Python. Hence you know you that the above can be made into a more Pythonic code.

You feel on fire.

Yes, this can be done smarter, you say.

def is_anagram_pythonic_way(str_a, str_b):
    return sorted(str_a) == sorted(str_b)

It does actually do the same and in only one line of code. Isn’t that what coding is all about. You showcase how good you are and how much you know?

Well, both yes an no.

Yes, because it does show you are familiar with coding in Python.

No, because it did not improve the algorithmic time complexity. It still runs in O(n log(n)).

So what is gained? What did you tell the interviewer?

They are looking for improvement of the algorithmic time complexity or space complexity. Hence, showing them a smarter way to write the code does not add many points to the book.

I know it feels awesome to write a one-liner that does the same, but that is not what they are looking for.

The advice is, limit your time on improving lines of code, if it does not improve the algorithmic time complexity or space complexity.

Step 5: Improving the performance again

There is often a even better way to solve this things.

How do you get an idea to do that? Well, look at what takes time in the algorithm from Step 3. It is the sorting. Without the sorting step, you could have an algorithm in O(n) time.

Can we do something different than sorting?

What does the sorting do for us? It arranges the characters in order, repeating each character with the number of occurrences.

What if we had a table and just updated the number of occurrences of each character?

Let’s try.

def is_anagram_further_improvements(str_a, str_b):
    occurrences_a = [0]*256
    occurrences_b = [0]*256
    for c in str_a:
        occurrences_a[ord(c)] += 1
    for c in str_b:
        occurrences_b[ord(c)] += 1
    for i1, i2 in zip(occurrences_a, occurrences_b):
        if i1 != i2:
            return False
    return True

First notice we just assume 256 characters, which is not needed for the assumption. It just simplifies some calculations of the characters. In a job interview they are not looking for the optimal final solution, just if you understand the concepts of big-O time complexity.

The initialization of the size 256 occurrences counters is independent of the size, hence O(1) time.

Then the following two for-loops take O(n) time. Finally, the last for-loop takes O(256) = O(1) time.

This is a total of O(n) time complexity of the algorithm.

See, we could do some awesome stuff together.

Conclusion

Is this the perfect written solution? No, this is a job interview. Time is limited. It is important to show what they are looking for.

Do you understand big-O time complexity. Do you see how a simple solution can be improved?

The anagram is one of the favorite coding interview problems, because it has this natural flow of improving it. They do not expect you to remember the optimal solution. But they expect you to reason your way forward as we did in this tutorial.

Avoid Common Object Oriented Programming Pitfalls

What will we cover in this tutorial?

I did them myself. When you first learn about object oriented programming, you get exited about it and want to turn everything into objects.

Very often it fails to make the code easier to maintain and understand. Yes, that is one core goal of creating object oriented programming, to make it easier to maintain and understand.

Often we just turn the traditional way of programming into objects. But that is not the point. The point is to model it in an object oriented way.

Before that, let’s just look at why we use object oriented programming.

Step 1: Why use object oriented programming?

When introducing object oriented programming to beginners it often introduces too many concepts at once.

Simply because object oriented programming is awesome and can do so many things.

Let’s just start simple with the core of the object oriented programming idea.

We want to make your program easier to understand and maintain.

That is the goal for any programmer. Also, when it comes to object oriented programming.

Object oriented programming tries to make the link between the real world and the programming world as close as possible. We humans understand the objects in the real world better than we understand how a computer pushes bits and bytes around in the memory and CPU.

Luckily, we do not need to understand all that. We just need to understand how to program the computer through a programming language.

Object oriented programming tries to make that easier with modeling the programs with objects, which are related to the way we humans understand things.

Let’s try with a simple example of a Stack.

A Stack has a top and bottom, it has operations push and pop.

How would you model the above without using object oriented programming.

stack_size = 8
stack = [None]*stack_size
top = -1 # empty stack

def pop():
    global top, stack
    if top == -1:
        return None
    else:
        top -= 1
        return stack[top + 1]

def push(element):
    global top, stack, stack_size
    if top + 1 >= stack_size:
        stack += [None]*stack_size
        stack_size *= 2
    top += 1
    stack[top] = element

def print_stack():
    global top, stack, stack_size
    for i in range(top + 1):
        print(stack[i], '', end='')
    print(" (size: " + str(stack_size) + ")")


print_stack()
for i in range(10):
    push(i)
print_stack()
for i in range(8):
    pop()
print_stack()

That is confusing code, right?

First of all, we use global variables. That makes the function calls hard to understand, as they have side effects.

This is made more confusing than necessary. It does use Python lists like an Array. That is not needed, but it is just to exemplify how difficult it is to make intuitive code if you model the world like a computer works.

Step 2: Using object oriented programming to solve the above (first step – the less bad, but still not good solution)

First time you are asked to create stack using an object oriented approach, you will probably do it in a non intuitive way.

Say, you think of a Stack like a object. The stack is the full object.

What does a stack consists of?

Well items which are piled on top of each other, and you can take the top off.

How could that be modeled?

Often the straight forward way from a classical thinking way into the a class.

class Stack:
    def __init__(self):
        self.stack_size = 8
        self.stack = [None]*self.stack_size
        self.top = -1 # empty stack

    def pop(self):
        if self.top == -1:
            return None
        else:
            self.top -= 1
            return self.stack[self.top + 1]

    def push(self, element):
        if self.top + 1 >= self.stack_size:
            self.stack += [None]*self.stack_size
            self.stack_size *= 2
        self.top += 1
        self.stack[self.top] = element

    def print_stack(self):
        for i in range(self.top + 1):
            print(self.stack[i], '', end='')
        print(" (size: " + str(self.stack_size) + ")")


s = Stack()
s.print_stack()
for i in range(10):
    s.push(i)
s.print_stack()
for i in range(8):
    s.pop()
s.print_stack()

If you inspect the code, it is actually the same code. Which in some ways has improved it.

  1. We do not have global variables anymore. They are tied to the Stack class.
  2. The function calls push and pop are also tied to the Stack class.
  3. Also, when we use the Stack, it is clear from the context, as it is tied to the variable s, in this case.

So what is wrong?

Well, it is still not simple to understand what happens in the code. It takes time to understand, even with this simple code.

The functions use variables like top and assigns it to -1 if the stack is empty. It requires you to investigate pop and the constructor to understand that.

The key is to keep it simple.

So how to do that?

Step 3: A simple way to model it

Let’s take a look at the drawing again.

A stack.

A stack actually consists of object on it. Hence, the stack is the abstraction that keeps objects in a specific order.

That said, we need to model that closer, as it will be easier to understand for the reader of the code.

The objects on the stack we will call Nodes.

What characterizes a Node? It lies either on the bottom or on top of another Node.

What characterizes a Stack? It knows the top of the stack and can push and pop Nodes.

How can that be turned into code?

class Node:
    def __init__(self, element=None, next_node=None):
        self.element = element
        self.next_node = next_node


class Stack:
    def __init__(self):
        self.top = None

    def push(self, element):
        self.top = Node(element, self.top)

    def pop(self):
        element = self.top.element
        self.top = self.top.next_node
        return element


s = Stack()
for i in range(20):
    s.push(i)
for i in range(10):
    print(s.pop(), '', end='')
print()

How is that code? Easier to understand?

Of course, normally a Stack would as a minimum have a helper function is_empty() to return if stack is empty. That can be added easily. You see how?

class Node:
    def __init__(self, element=None, next_node=None):
        self.element = element
        self.next_node = next_node


class Stack:
    def __init__(self):
        self.top = None

    def push(self, element):
        self.top = Node(element, self.top)

    def pop(self):
        element = self.top.element
        self.top = self.top.next_node
        return element

    def is_empty(self):
        return self.top == None


s = Stack()
for i in range(20):
    s.push(i)
while not s.is_empty():
    print(s.pop(), '', end='')
print()

Do you get the sense of it now?

Model the code as closely to the reality we humans understand. It makes the code easy to read and understand. Hence, easy to maintain.