Data Manipulation
Modifying data
This is a data manipulation tutorial.
In Pandas, data manipulation refers to the process of modifying data. It provides various data structures and functions for working with structured data. Within this tutorial, we cover modification, conditional selection, and filtering of data. Manipulation is a broad term, also encompassing processes such as transformation and cleaning of data as well as applying functions and analysing data. For such processes, please refer to our other Pandas tutorials.
Modifying data in Pandas is very easy and efficient. Working with columns and rows involves tasks such as adding, deleting, and renaming columns and rows in a DataFrame. Let’s explore each method separately.
Adding
Columns
You can add new columns to a DataFrame by simply assigning values to them.
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # Adding a new column df['C'] = [7, 8, 9] print(df)
A B C 0 1 4 7 1 2 5 8 2 3 6 9
Rows
You can add new rows to a DataFrame using the “concat()” method.
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # Adding a new row new_row = pd.DataFrame({'A': [4], 'B': [7]}) df = pd.concat([df, new_row], ignore_index=True) print(df)
A B 0 1 4 1 2 5 2 3 6 3 4 7
Deleting
Columns
You can delete columns using the “drop()” method or “del” keyword.
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Deleting a column using drop() df.drop('C', axis=1, inplace=True) print(df) # Deleting a column using del keyword del df['B'] print(df)
A B 0 1 4 1 2 5 2 3 6 A 0 1 1 2 2 3
Rows
You can delete rows using the “drop()” method by specifying the index.
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Deleting a row df.drop(1, inplace=True) print(df)
A B C 0 1 4 7 2 3 6 9
Renaming
Columns
You can rename columns using the “rename()” method.
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Renaming columns df.rename(columns={'A': 'New_A', 'B': 'New_B'}, inplace=True) print(df)
New_A New_B C 0 1 4 7 1 2 5 8 2 3 6 9
Rows
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Renaming rows df.rename(index={0: 'row1', 2: 'row3'}, inplace=True) print(df)
A B C row1 1 4 7 1 2 5 8 row3 3 6 9
Conditional Selection and Filtering
In Pandas, conditional selection and filtering allow you to extract subsets of data from a DataFrame based on certain conditions. We explore both separately.
Conditional selection
The term refers to selecting specific data, based on some conditions. Conditional operators (such as +, >, &, etc.) play an important role here.
import pandas as pd # Sample DataFrame df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) # Single condition result = df[df['A'] > 3] print(result) # Multiple conditions result = df[(df['A'] > 2) & (df['B'] < 8)] print(result)
A B 3 4 8 A B 2 3 7
Filtering
This term also refers to the process of selecting specific data, but involves some functions. The functions we cover here are “str.contains()“, “isin()“, and “query()“.
- str.contains() – you can filter rows based on string values using string methods or the “str.contains()” method.
- isin() – you can filter rows based on whether a value is in a list or array with the “isin()” method.
- query() – you can use the “query()” method to filter rows using a query expression.
# Sample DataFrame df = pd.DataFrame({'Name': ['Ben', 'Sam', 'Emma', 'Alice'], 'Age': [18, 46, 22, 29]}) # Select rows where 'Name' contains 'a' result = df[df['Name'].str.contains('a')] print(result) # Select rows where 'Age' is in a list of values result = df[df['Age'].isin([20, 21, 22])] print(result) # Select rows where 'Age' is greater than 28 result = df.query('Age > 28') print(result)
Name Age 1 Sam 46 2 Emma 22 Name Age 2 Emma 22 Name Age 1 Sam 46 3 Alice 29
This is an original data manipulation educational material created by aicorr.com.
Next: Data Operations