How to Reformat a Text File in Python

The input file and the desired output

The task is to reformat the following input format.

Computing
“I do not fear computers. I fear lack of them.”
— Isaac Asimov
“A computer once beat me at chess, but it was no match for me at kick boxing.”
— Emo Philips
“Computer Science is no more about computers than astronomy is about telescopes.”
— Edsger W. Dijkstra

To the following output format.

“I do not fear computers. I fear lack of them.” (Isaac Asimov)
“A computer once beat me at chess, but it was no match for me at kick boxing.” (Emo Philips)
“Computer Science is no more about computers than astronomy is about telescopes.” (Edsger W. Dijkstra)

The Python code doing the job

The following simple code could do the reformatting in less than a second for a file that contained multiple hundreds quotes.

file = open("input")
content = file.readlines()
file.close()
lines = []
next_line = ""
for line in content:
    line = line.strip()
    if len(line) > 0 and len(line.split()) > 1:
        if line[0] == '“':
            next_line = line
        elif line[0] == '—':
            next_line += " (" + line[2:] + ")"
            lines.append(next_line)
            next_line = ""

file = open("output", "w")
for line in lines:
    file.write(line + "\n")
file.close()

How to Fetch CNN Breaking Tweets and Make Simple Statistics Automated with Python

What will we cover

  • We will use the tweepy library
  • Read the newest tweets from CNN Breaking
  • Make simple word statistics on the news tweets
  • See if we can learn anything from it

Preliminaries

The Code that does the magic

import tweepy
# personal details insert your key, secret, token and token_secret here
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# authentication of access token and secret
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Creation of the actual interface, using authentication
api = tweepy.API(auth)
# Use a dictionary to count the appearances of words
stat = {}
# Read the tweets from @cnnbrk and make the statistics
for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
    for word in status.full_text.split():
        if word in stat:
            stat[word] += 1
        else:
            stat[word] = 1
# Let's just print the top 10
top = 10
# Let us sort them on the value in reverse order to get the highest first
for word in sorted(stat, key=stat.get, reverse=True):
    # leave out all the small words
    if len(word) > 6:
        print(word, stat[word])
        top -= 1
        if top < 0:
            break

The result of the above (done May 30th, 2020)

coronavirus 441
@CNNPolitics: 439
President 380
updates: 290
impeachment 148
officials 130
according 100
Trump's 98
Democratic 96
against 88
Department 83

The coronavirus is still the most breaking subject of today.

Next steps

  • It should be extended to have a more intelligent interpretation of the data.

Understand the Password Validation in Mac in 3 Steps – Implement the Validation in Python

What will you learn?

  • The password validation process in Mac
  • How to extract the password validation values
  • Implementing the check in Python
  • Understand why the values are as they are
  • The importance of using a salt value with the password
  • Learn why the hash function is iterated multiple times

The Mac password validation process

Every time you log into your Mac it needs to verify that you used the correct password before giving you access.

The validation process reads hash, salt and iteration values from storage and uses them to validate your password.

The 3 steps below helps you to locate your values and how the validation process is done.

Step 1: Locating and extracting the hash, salt and iteration values

You need to use a terminal to extract the values. By using the following command you should get it printed in a readable way.

sudo defaults read /var/db/dslocal/nodes/Default/users/<username>.plist ShadowHashData | tr -dc 0-9a-f | xxd -r -p | plutil -convert xml1 - -o -

Where you need to exchange <username> with your actual user name. The command will prompt you for admin password.

This should result in an output similar to this.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>SALTED-SHA512-PBKDF2</key>
	<dict>
		<key>entropy</key>
		<data>
                1meJW2W6Zugz3rKm/n0yysV+5kvTccA7EuGejmyIX8X/MFoPxmmbCf3BE62h
                6wGyWk/TXR7pvXKg\njrWjZyI+Fc3aKfv1LNQ0/Qrod3lVJcWd9V6Ygt+MYU
                8Eptv3uwDcYf6Z5UuF+Hg67rpoDAWhJrC1\nPEfL3vcN7IoBqC5NkIU=
		</data>
		<key>iterations</key>
		<integer>45454</integer>
		<key>salt</key>
		<data>
		6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=
		</data>
	</dict>
</dict>
</plist>

Step 2: Understand the output

The output consists of four pieces.

  • Key value: SALTED-SHA512-PBKDF2
  • Entropy: Base64 encoded data.
  • Number of iteration: 45454
  • Salt: Base64 encoded data

The Key value is the tells you which algorithm is used (SHA512) and how it is used (PBKDF2).

The entropy is the actual result of the validation algorithm determined by the key value . This “value” is not an encryption of the password, which means you cannot recover the password from that value, but you can validate if the password matches this value.

Confused? I know. But you will understand when we implement the solution

The number of iterations, here 45454, is the number of times the hash function is called. Also, why would you call the hash function multiple times? Follow along and you will see.

Finally, we have the salt value. That is to ensure that you cannot determine the password from the entropy value itself. This will also get explained with example below.

Step 3: Validating the password with Python

Before we explain the above, we need to be have Python do the check of the password.

import hashlib
import base64
iterations = 45454
salt = base64.b64decode("6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=".encode())
password = "password".encode()
value = hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128)
print(base64.b64encode(value))

Which will generate the following output

b'1meJW2W6Zugz3rKm/n0yysV+5kvTccA7EuGejmyIX8X/MFoPxmmbCf3BE62h6wGyWk/TXR7pvXKgjrWjZyI+Fc3aKfv1LNQ0/Qrod3lVJcWd9V6Ygt+MYU8Eptv3uwDcYf6Z5UuF+Hg67rpoDAWhJrC1PEfL3vcN7IoBqC5NkIU='

That matches the entropy content of the file.

So what happened in the above Python code?

We use the hashlib library to do all the work for us. It takes the algorithm (sha512), the password (Yes, I used the password ‘password’ in this example, you should not actually use that for anything you want to keep secret from the public), the salt and the number of iterations.

Now we are ready to explore the questions.

Why use a Hash value and not an encryption of the password?

If the password was encrypted, then an admin on your network would be able to decrypt it and misuse it.

Hence, to keep it safe from that, an iterated hash value of your password is used.

A hash function is a one-way function that can map any input to a fixed sized output. A hash function will have these important properties in regards to passwords.

  • It will always map the same input to the same output. Hence, your password will always be mapped to the same value.
  • A small change in the input will give a big change in output. Hence, if you change one character in the password (say, from ‘password’ to ‘passward’) the hash value will be totally different.
  • It is not easy to find the given input to a hash value. Hence, it is not easily feasible to find your password given the hash value.

Why use multiple iterations of the hash function?

To slow it down.

Basically, the way your find passwords is by trying all possibilities. You try ‘a’ and map it to check if that gives the password. Then you try ‘b’ and see.

If that process is slow, you decrease the odds of someone finding your password.

To demonstrate this we can use the cProfile library to investigate the difference in run-time. First let us try it with the 45454 iterations in the hash function.

import hashlib
import base64
import cProfile

def crack_password(entropy, iterations, salt):
    alphabet = "abcdefghijklmnopqrtsuvwxyz"
    for c1 in alphabet:
        for c2 in alphabet:
            password = str.encode(c1 + c2)
            value = base64.b64encode(hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128))
            if value == entropy:
                return password

entropy = "kRqabDBsvkyAhpzzVWJtdqbtqgkgNPwr5gqWG6jvw73hxc7CCvC4E33WyR5bxKmAXG5vAG9/ue+DC7BYLHRfOTE/dLKSMdpE9RFH7ZlTp7GHdH5b5vaqQCcKlXAwkky786zvpucDIgGGTOyw6kKB5hqIXLX9chDvcPQksVrjmUs=".encode()
iterations = 45454
salt = base64.b64decode("6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=".encode())
cProfile.run("crack_password(entropy, iterations, salt)")

This results in a run time of.

        1    0.011    0.011   58.883   58.883 ShadowFile.py:6(crack_password)

About 1 minute.

If we change the number of iterations to 1.

import hashlib
import base64
import cProfile

def crack_password(entropy, iterations, salt):
    alphabet = "abcdefghijklmnopqrtsuvwxyz"
    for c1 in alphabet:
        for c2 in alphabet:
            password = str.encode(c1 + c2)
            value = base64.b64encode(hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128))
            if value == entropy:
                return password

entropy = "kRqabDBsvkyAhpzzVWJtdqbtqgkgNPwr5gqWG6jvw73hxc7CCvC4E33WyR5bxKmAXG5vAG9/ue+DC7BYLHRfOTE/dLKSMdpE9RFH7ZlTp7GHdH5b5vaqQCcKlXAwkky786zvpucDIgGGTOyw6kKB5hqIXLX9chDvcPQksVrjmUs=".encode()
iterations = 1
salt = base64.b64decode("6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=".encode())
cProfile.run("crack_password(entropy, iterations, salt)")

I guess you are not surprised it takes less than 1 second.

        1    0.002    0.002    0.010    0.010 ShadowFile.py:6(crack_password)

Hence, you can check way more passwords if only iterated 1 time.

Why use a Salt?

This is interesting.

Well, say that another user used the password ‘password’ and there was no salt.

import hashlib
import base64
iterations = 45454
salt = base64.b64decode("".encode())
password = "password".encode()
value = hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128)
print(base64.b64encode(value))
b'kRqabDBsvkyAhpzzVWJtdqbtqgkgNPwr5gqWG6jvw73hxc7CCvC4E33WyR5bxKmAXG5vAG9/ue+DC7BYLHRfOTE/dLKSMdpE9RFH7ZlTp7GHdH5b5vaqQCcKlXAwkky786zvpucDIgGGTOyw6kKB5hqIXLX9chDvcPQksVrjmUs='

Then you would get the same hash value.

Hence, for each user password, there is a new random salt used.

How to proceed from here?

If you want to crack passwords, then I would recommend you use Hashcat.