Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    What is a Substitution Cipher and is it Secure?

    What will you learn?

    • What is a Substitution Cipher?
    • How to implement the Substitution Cipher in Python
    • Understand the key space of Substitution Cipher
    • The weakness and how to break the Substitution Cipher

    What is a Substitution Cipher

    Imagine you receive this message: IQL WPV WCVJQEV

    What does it mean?

    You were told it is a Substitution Cipher, but how will that help you?

    First of all, we need to understand what a Substitution Cipher is. Basically, it is just rearranging the characters. That is every time you write an A, you exchange that with, say, Q. And B with G. C with, hey, let’s keep the C.

    See the mapping in the picture below.

    Substitution Cipher Example
    Substitution Cipher Example

    That seems pretty solid. Right?

    But can we figure out what your message means?

    First, let’s try to implement a Substitution Cipher.

    Implementing Substitution Cipher in Python

    We will use the random library to generate random keys. We’ll get back to how many keys are there.

    import random
    
    def generate_key():
        alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        chars = list(alphabet)
        key = {}
        for c in alphabet:
            key[c] = chars.pop(random.randint(0, len(chars) - 1))
        return key
    
    def print_key(key):
        for c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
            print(c, key[c])
    
    def encrypt(key, message):
        cipher = ""
        for c in message:
            if c in key:
                cipher += key[c]
            else:
                cipher += " "
        return cipher
    
    def get_decrypt_key(key):
        dkey = {}
        for k in key:
            dkey[key[k]] = k
        return dkey
    
    key = generate_key()
    print(key)
    cipher = encrypt(key, "YOU ARE AWESOME")
    print(cipher)
    dkey = get_decrypt_key(key)
    message = encrypt(dkey, cipher)
    print(message)
    

    Well, let’s understand the key-space. How secure is the Substitution Cipher?

    The key-space of the Substitution Cipher

    Let’s examine how the keys are generated.

    Substitution Cipher with key-space calculation
    Substitution Cipher with key-space calculation

    The first letter, A, can be mapped to 26 possibilities (including A itself). The next letter, B, can be mapped to 25 possibilities. The third letter, C, can be mapped to (difficult to guess) 24 possibilities. And so forth.

    This generates a space of (read the large number in the picture above). That is right. It is more than 88 bits of security.

    That is a lot.

    88 bits of security? That is you need to try (more than) 309485009821345068724781056 possibilities.

    A whole lot. And that, I promise you, should be considered secure.

    But wait, Substitution Cipher is not used any more, why? Read on, and you will know in a moment.

    The weakness of Substitution Cipher

    If the underlying language is English, then you can make a simple frequency analysis of how often the letters occur on average in English.

    It turns out to be quite simple.

    letter_freq = {'a': 0.0817, 'b': 0.0150, 'c': 0.0278, 'd': 0.0425, 'e': 0.1270, 'f': 0.0223,
                   'g': 0.0202, 'h': 0.0609, 'i': 0.0697, 'j': 0.0015, 'k': 0.0077, 'l': 0.0403,
                   'm': 0.0241, 'n': 0.0675, 'o': 0.0751, 'p': 0.0193, 'q': 0.0010, 'r': 0.0599,
                   's': 0.0633, 't': 0.0906, 'u': 0.0276, 'v': 0.0098, 'w': 0.0236, 'x': 0.0015,
                   'y': 0.0197, 'z': 0.0007}
    

    That is, the letter ‘a’ occurs 8.17% percent probability. ‘b’ with 1.5%. ‘c’ with 2.78%.

    Hence, given the text: IQL WPV WCVJQEV

    Well, we are out of luck, because it is too short to have any frequency analysis to have any significance.

    But the following text.

    lrvmnir bpr sumvbwvr jx bpr lmiwv yjeryrkbi jx qmbm wi
    bpr xjvni mkd ymibrut jx irhx wi bpr riirkvr jx
    ymbinlmtmipw utn qmumbr dj w ipmhh but bj rhnvwdmbr bpr
    yjeryrkbi jx bpr qmbm mvvjudwko bj yt wkbrusurbmbwjk
    lmird jk xjubt trmui jx ibndt
      wb wi kjb mk rmit bmiq bj rashmwk rmvp yjeryrkb mkd wbi
    iwokwxwvmkvr mkd ijyr ynib urymwk nkrashmwkrd bj ower m
    vjyshrbr rashmkmbwjk jkr cjnhd pmer bj lr fnmhwxwrd mkd
    wkiswurd bj invp mk rabrkb bpmb pr vjnhd urmvp bpr ibmbr
    jx rkhwopbrkrd ywkd vmsmlhr jx urvjokwgwko ijnkdhrii
    ijnkd mkd ipmsrhrii ipmsr w dj kjb drry ytirhx bpr xwkmh
    mnbpjuwbt lnb yt rasruwrkvr cwbp qmbm pmi hrxb kj djnlb
    bpmb bpr xjhhjcwko wi bpr sujsru msshwvmbwjk mkd
    wkbrusurbmbwjk w jxxru yt bprjuwri wk bpr pjsr bpmb bpr
    riirkvr jx jqwkmcmk qmumbr cwhh urymwk wkbmvb
    

    This has enough letters to make an analysis of the letters. Let’s try.

    cipher = """lrvmnir bpr sumvbwvr jx bpr lmiwv yjeryrkbi jx qmbm wi
    bpr xjvni mkd ymibrut jx irhx wi bpr riirkvr jx
    ymbinlmtmipw utn qmumbr dj w ipmhh but bj rhnvwdmbr bpr
    yjeryrkbi jx bpr qmbm mvvjudwko bj yt wkbrusurbmbwjk
    lmird jk xjubt trmui jx ibndt
      wb wi kjb mk rmit bmiq bj rashmwk rmvp yjeryrkb mkd wbi
    iwokwxwvmkvr mkd ijyr ynib urymwk nkrashmwkrd bj ower m
    vjyshrbr rashmkmbwjk jkr cjnhd pmer bj lr fnmhwxwrd mkd
    wkiswurd bj invp mk rabrkb bpmb pr vjnhd urmvp bpr ibmbr
    jx rkhwopbrkrd ywkd vmsmlhr jx urvjokwgwko ijnkdhrii
    ijnkd mkd ipmsrhrii ipmsr w dj kjb drry ytirhx bpr xwkmh
    mnbpjuwbt lnb yt rasruwrkvr cwbp qmbm pmi hrxb kj djnlb
    bpmb bpr xjhhjcwko wi bpr sujsru msshwvmbwjk mkd
    wkbrusurbmbwjk w jxxru yt bprjuwri wk bpr pjsr bpmb bpr
    riirkvr jx jqwkmcmk qmumbr cwhh urymwk wkbmvb"""
    alphabet = "abcdefghijklmnopqrstuvwxyz"
    freq = {}
    for c in alphabet:
        freq[c] = 0
    cnt = 0
    for c in cipher:
        if c in alphabet:
            freq[c] += 1
            cnt += 1
    for c in freq:
        freq[c] = round(freq[c]/cnt, 4)
    print(freq)
    

    Will give you the following output.

    {'a': 0.0077, 'b': 0.1053, 'c': 0.0077, 'd': 0.0356, 'e': 0.0077, 'f': 0.0015, 'g': 0.0015, 'h': 0.0356, 'i': 0.0635, 'j': 0.0743, 'k': 0.0759, 'l': 0.0124, 'm': 0.096, 'n': 0.0263, 'o': 0.0108, 'p': 0.0464, 'q': 0.0108, 'r': 0.13, 's': 0.0263, 't': 0.0201, 'u': 0.0372, 'v': 0.0341, 'w': 0.0728, 'x': 0.031, 'y': 0.0294, 'z': 0.0}
    

    This gives you some hints on how the letters are mapped.

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment