What will we cover in this tutorial?
Wonder why some of these trivial questions are popular in coding interviews?
The truths is, they have a depth of hidden aspects which uncovers whether you understand the the true nature of the problem.
Often, there is no right and wrong solution. It depends. Can you answer what it depends on?
In this tutorial we will investigate the Anagram Problem.
What they are really looking for is your underlying understanding of algorithmic time complexity (big-O analysis) and your ability to improve on solutions.
This tutorial will show one example of this.
Step 1: What is the Anagram Problem?
The first part of a coding challenge problem is to show that you understand it. Many underestimate this part and do not realize how important that is.
Often the problem statements are a bit vague, but given with some examples. It could be directly taken from wikipedia and ask you to implement it.
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase.https://en.wikipedia.org/wiki/Anagram
Or as it could be stated in you interview question: Given two strings, str_a and str_b, return whether they are an anagram of each other.
First, you need to understand the problem. Often there will be examples to show you what an anagram is.
An examples could include.
- “Listen” and “Silent”
Where you notice that the letters in LISTEN are the exact same as in SILENT, just in a different order.
What you also notice is that the lowercase and uppercase of the letters do not matter.
The definition says also phrases, hence, examples of anagrams include.
- “a gentleman” and “elegant man”
- “eleven plus two” and “twelve plus one”
- “William Shakespeare” = “I am a weakish speller”
The last example is quite interesting, as it shows that the number of spaces do not seem to matter. The phrase “William Shakespeare” has 1 space, while the phrase “I am a weakish speller” has 4 spaces.
Now your part is to show that you understand it. In this case it would be good to state an input that will return True and one that will return False.
Say, “evil” and “vile” as input returns True, while “evil” and “test” will return False.
If you do not understand it, don’t worry. Imagine that you do not admit it. Then what is the best that can happen? Are you by any chance going to guess it out of the blue? Most likely not.
The best you can do is to admit it and explain as good as you can as much of what you think you understand.
Step 2: Find edge cases and set restrictions on input
A natural next step is to consider edge cases and set restrictions. Most would jump into writing some pseudo code and try to solve it. Don’t be that type of person. Take a deep breath and remember what we said in Step 1.
What did we observe in Step 1?
- That uppercase and lowercase is ignored when considering anagrams.
- That the number of spaces do not matter when considering anagrams.
Now this is important to notice.
An advise is to simplify as much as possible in a job interview. But you need to tell them that you do simplify the problem.
Hence, you can say.
- You assume that input will be all in lowercase.
- That input will not have spaces.
Is this reasonable? Well, it simplifies your code a lot. If they want you to consider these cases, they will let you know.
The importance of this step cannot be stressed enough. Because, if you just assumes that you can restrict input and write your code, they might now you have considered theses cases.
On the other hand, if you do set it as restrictions before hand they know you have thought of it.
Step 3: The naive solution
The next step is to make a solution. As you might not come up with the best solution first, a good way is to start with whatever comes to your mind and call it the naive solution.
It is always a good idea to write it down. This will help you get a better idea to solve it more efficient afterwards. Notice, that the restrictions you made in Step 2 comes in handy now, as you can construct an easy solution fast now.
How could a naive solution look like?
def is_anagram_naive(str_a, str_b): if len(str_a) != len(str_b): return False for c in str_a: if c not in str_b: return False else: str_b = str_b.replace(c, '', 1) return True print(is_anagram_naive("silent", "listen")) print(is_anagram_naive("elevenplustwo", "twelveplusone")) print(is_anagram_naive("anna", "anaa"))
The question is how does it perform?
Well, first the check of the length of the string should be in constant time, O(1). If the underlying interpreter does not have an efficient implementation, then it could be O(n) time, where n is the length of str_a and str_b.
The loop over str_a is O(n). In each loop it will check if the character c is in str_b. This check is implemented to be faster than brute force. In an job interview you are not expected to remember that. It is good enough to say O(n) in first iteration.
Then we notice that it will use n, n-1, n-2, …, 3, 2, 1 operations in total to do this test. This results to n(n-1)/2 operations, which is O(n^2) in total.
On top of that, it will use the same to replace the character c in str_b.
That means the algorithm uses O(n^2) time complexity in worst case.
Step 3: Improving the performance
This is where it gets interesting. Now you showed you can solve the problem. The next step is to show that you can improve it.
How to get an idea?
Good you asked.
It lies in the previous step. What is the time complexity. You need to improve on the performance. What is often better than O(n^2)?
Yes, a natural next improvement would be O(n log(n)). What algorithms do you know? Well, sorting, can that help?
What if we sort the characters and compare them in order, then they must be identical if it is an anagram. Try to sort the characters of any anagram. Example, “silent” becomes [‘e’, ‘i’, ‘l’, ‘n’, ‘s’, ‘t’], and “listen” becomes [‘e’, ‘i’, ‘l’, ‘n’, ‘s’, ‘t’], identical.
This makes sense. As they have the same characters, they must come in the same order if we have them sorted.
def is_anagram_improved(str_a, str_b): if len(str_a) != len(str_b): return False list_a = list(str_a) list_b = list(str_b) list_a.sort() list_b.sort() for l1, l2 in zip(list_a, list_b): if l1 != l2: return False return True
We start by checking if the length of the strings are the same, if not, return False, as it cannot be anagrams.
Then convert the strings to lists. It should take O(n) time. Now for the expensive step. Sorting it. This takes O(n log(n)) time.
Finally we check the characters one by one, in sorted order, if they are equal. If not, we do not have an anagram. This takes O(n) time.
That is, it uses O(n) + O(n log(n)) + O(n) = O(n log(n)) time.
You did the first improvement.
Can you do better? I know, this is annoying. But maybe we can. Let’s try.
Step 4: Improving the code – a common pitfall
Well, you are a shark at Python. Hence you know you that the above can be made into a more Pythonic code.
You feel on fire.
Yes, this can be done smarter, you say.
def is_anagram_pythonic_way(str_a, str_b): return sorted(str_a) == sorted(str_b)
It does actually do the same and in only one line of code. Isn’t that what coding is all about. You showcase how good you are and how much you know?
Well, both yes an no.
Yes, because it does show you are familiar with coding in Python.
No, because it did not improve the algorithmic time complexity. It still runs in O(n log(n)).
So what is gained? What did you tell the interviewer?
They are looking for improvement of the algorithmic time complexity or space complexity. Hence, showing them a smarter way to write the code does not add many points to the book.
I know it feels awesome to write a one-liner that does the same, but that is not what they are looking for.
The advice is, limit your time on improving lines of code, if it does not improve the algorithmic time complexity or space complexity.
Step 5: Improving the performance again
There is often a even better way to solve this things.
How do you get an idea to do that? Well, look at what takes time in the algorithm from Step 3. It is the sorting. Without the sorting step, you could have an algorithm in O(n) time.
Can we do something different than sorting?
What does the sorting do for us? It arranges the characters in order, repeating each character with the number of occurrences.
What if we had a table and just updated the number of occurrences of each character?
def is_anagram_further_improvements(str_a, str_b): occurrences_a = *256 occurrences_b = *256 for c in str_a: occurrences_a[ord(c)] += 1 for c in str_b: occurrences_b[ord(c)] += 1 for i1, i2 in zip(occurrences_a, occurrences_b): if i1 != i2: return False return True
First notice we just assume 256 characters, which is not needed for the assumption. It just simplifies some calculations of the characters. In a job interview they are not looking for the optimal final solution, just if you understand the concepts of big-O time complexity.
The initialization of the size 256 occurrences counters is independent of the size, hence O(1) time.
Then the following two for-loops take O(n) time. Finally, the last for-loop takes O(256) = O(1) time.
This is a total of O(n) time complexity of the algorithm.
See, we could do some awesome stuff together.
Is this the perfect written solution? No, this is a job interview. Time is limited. It is important to show what they are looking for.
Do you understand big-O time complexity. Do you see how a simple solution can be improved?
The anagram is one of the favorite coding interview problems, because it has this natural flow of improving it. They do not expect you to remember the optimal solution. But they expect you to reason your way forward as we did in this tutorial.