What will we cover?
How to combine data from different DataFrames into one DataFrame.
There are a few methods that are normally used.
concat([df1, df2], axis=0): concat Concatenate pandas objects along a particular axis
df.join(other.set_index('key'), on='key'): join Join columns of another DataFrame.
df1.merge(df2, how='inner', on='a')merge Merge DataFrame or named Series objects with a database-style join.
Also see the pandas cheat sheet for details (pandas cheat sheet).
Step 1: Getting some demonstration data
In this tutorial we will use some data and meta data and combine them into one DataFrame. The data is from the World Bank database.
The data can be downloaded directly from the World Bank or from my GitHub. Here we just access it directly from the GitHub.
import pandas as pd data_file = 'https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/files/API_SP/API_SP.POP.TOTL_DS2_en_csv_v2_3158886.csv' data = pd.read_csv(data_file, skiprows=4) meta_file = 'https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/files/API_SP/Metadata_Country_API_SP.POP.TOTL_DS2_en_csv_v2_3158886.csv' meta = pd.read_csv(meta_file)
A snippet of the DataFrames should be similar to this.
And the meta data.
Step 2: Combining the DataFrames with Merge
One good way to combine data is by using merge, which is easy to use, if both DataFrames has the same column name to combine it on.
dataset = data.merge(meta, how='inner', on='Country Code')
This shows the last columns of the new DataFrame from dataset.
Step 3: Showing the new enriched data
Now we can use the new dataset with the new columns.
One thing you can do, is using the groupby.
Want to learn more?
Want to learn more about Data Science to become a successful Data Scientist?
This is one lesson of a 15 part Expert Data Science Blueprint course with the following resources.
- 15 video lessons – covers the Data Science Workflow and concepts, demonstrates everything on real data, introduce projects and shows a solution (YouTube video).
- 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
- 15 projects – structured with the Data Science Workflow and a solution explained in the end of video lessons (GitHub).