Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    NumPy: Analyse Narcissistic Personality Indicator Numerical Dataset

    What is Narcissistic Personality Indicator and how does it connect to NumPy?

    NumPy is an amazing library that makes analyzing data easy, especially numerical data.

    In this tutorial we are going to analyze a survey with 11.000+ respondents from an interactive Narcissistic Personality Indicator (NPI) test.

    Narcissism in personality trait generally conceived of as excessive self love. In Greek mythology Narcissus was a man who fell in love with his reflection in a pool of water.

    https://openpsychometrics.org/tests/NPI/

    The only connection between NPI and NumPy is that we want to analyze the 11.000+ answers.

    The dataset can be downloaded here, which consists of a comma separated file, or CSV file for short and a description.

    Step 1: Import the dataset and explore it

    NumPy has thought of it for us, as simple as magic to load the dataset (in from the link above).

    import numpy as np
    # This magic line loads the 11.000+ lines of data to a ndarray
    data = np.genfromtxt('data.csv', delimiter=',', dtype='int')
    # Skip first row
    data = data[1:]
    print(data)
    

    And we print a summary out.

    [[ 18   2   2 ... 211   1  50]
     [  6   2   2 ... 149   1  40]
     [ 27   1   2 ... 168   1  28]
     ...
     [  6   1   2 ... 447   2  33]
     [ 12   2   2 ... 167   1  24]
     [ 18   1   2 ... 291   1  36]]
    

    A good idea is to investigate it from a spreadsheet as well to investigate it.

    Spreadsheet

    And the far end.

    Spreadsheet

    Oh, that end.

    Then investigate the description from the dataset. (Here we have some of it).

    For questions 1=40 which choice they chose was recorded per the following key.
    ... [The questions Q1 ... Q40]
    ...
    gender. Chosen from a drop down list (1=male, 2=female, 3=other; 0=none was chosen).
    age. Entered as a free response. Ages below 14 have been ommited from the dataset.
    -- CALCULATED VALUES --
    elapse. (time submitted)-(time loaded) of the questions page in seconds.
    score. = ((int) $_POST['Q1'] == 1)
    ... [How it is calculated]
    

    That means we score, answers to questions, elapsed time to answer, gender and age.

    Reading a bit more, it says that a high score is an indicator for having narcissistic traits, but one should not conclude that it is one.

    Step 2: Men or Women highest NPI?

    I’m glad you asked.

    import numpy as np
    data = np.genfromtxt('data.csv', delimiter=',', dtype='int')
    # Skip first row
    data = data[1:]
    # Extract all the NPI scores (first column)
    npi_score = data[:,0]
    print("Average score", npi_score.mean())
    print("Men average", npi_score[data[:,42] == 1].mean())
    print("Women average", npi_score[data[:,42] == 2].mean())
    print("None average", npi_score[data[:,42] == 0].mean())
    print("Other average", npi_score[data[:,42] == 3].mean())
    

    Before looking at the result, see how nice the data the first column is sliced out to the view in npi_score. Then notice how easy you can calculate the mean based on a conditional rules to narrow the view.

    Average score 13.29965311749533
    Men average 14.195953307392996
    Women average 12.081829626521191
    None average 11.916666666666666
    Other average 14.85
    

    I guess you guessed it. Men score higher.

    Step 3: Is there a correlation between age and NPI score?

    I wonder about that too.

    How can we figure that out? Wait, let’s ask our new friend NumPy.

    import numpy as np
    import matplotlib.pyplot as plt
    data = np.genfromtxt('data.csv', delimiter=',', dtype='int')
    # Skip first row
    data = data[1:]
    # Extract all the NPI scores (first column)
    npi_score = data[:,0]
    age = data[:,43]
    # Some age values are not real, so we adjust them to 0
    age[age>100] = 0
    # Scatter plot them all with alpha=0.05
    plt.scatter(age, npi_score, color='r', alpha=0.05)
    plt.show()
    

    Resulting in.

    Plotting age vs NPI

    That looks promising. But can we just conclude that younger people score higher NPI?

    What if most respondent are young, then that would make the picture more dense in the younger end (15-30). The danger with your eye is making fast conclusions.

    Luckily, NumPy can help us there as well.

    print(np.corrcoef(npi_score, age))
    

    Resulting in.

    Correlation of NPI score and age:
    [[ 1.         -0.23414633]
     [-0.23414633  1.        ]]
    

    What does that mean? Well, looking at the documentation of np.corroef():

    Return Pearson product-moment correlation coefficients.

    https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html

    It has a negative correlation, which means that the younger the higher NPI score. Values between 0.0 and -0.3 are considered low.

    Is the Pearson product-moment correlation the correct one to use?

    Step 4: (Optional) Let’s try to see if there is a correlation between NPI score and time elapsed

    Same code, different column.

    import numpy as np
    import matplotlib.pyplot as plt
    
    data = np.genfromtxt('data.csv', delimiter=',', dtype='int')
    # Skip first row
    data = data[1:]
    # Extract all the NPI scores (first column)
    npi_score = data[:,0]
    elapse = data[:,41]
    elapse[elapse > 2000] = 2000
    # Scatter plot them all with alpha=0.05
    plt.scatter(elapse, npi_score, color='r', alpha=0.05)
    plt.show()
    

    Resulting in.

    Time elapsed in seconds and NPI score

    Again, it is tempting to conclude something here. We need to remember that the mean value is around 13, hence, most data will be around there.

    If we use the same calculation.

    print("Correlation of NPI score and time elapse:")
    print(np.corrcoef(npi_score, elapse))
    

    Output.

    Correlation of NPI score and time elapse:
    [[1.        0.0147711]
     [0.0147711 1.       ]]
    

    Hence, here the there is close to no correlation.

    Conclusion

    Use the scientific tools to conclude. Do not rely on you eyes to determine whether there is a correlation.

    The above gives an idea on how easy it is to work with numerical data in NumPy.

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment