Pandas

Data Manipulation

Modifying data

This is a data manipulation tutorial.

In Pandas, data manipulation refers to the process of modifying data. It provides various data structures and functions for working with structured data. Within this tutorial, we cover modification, conditional selection, and filtering of data. Manipulation is a broad term, also encompassing processes such as transformation and cleaning of data as well as applying functions and analysing data. For such processes, please refer to our other Pandas tutorials.

Modifying data in Pandas is very easy and efficient. Working with columns and rows involves tasks such as adding, deleting, and renaming columns and rows in a DataFrame. Let’s explore each method separately.

Adding

Columns

You can add new columns to a DataFrame by simply assigning values to them.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [4, 5, 6]})

# Adding a new column
df['C'] = [7, 8, 9]

print(df)
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

Rows

You can add new rows to a DataFrame using the “concat()” method.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [4, 5, 6]})

# Adding a new row
new_row = pd.DataFrame({'A': [4], 'B': [7]})
df = pd.concat([df, new_row], ignore_index=True)

print(df)
   A  B
0  1  4
1  2  5
2  3  6
3  4  7

Deleting

Columns

You can delete columns using the “drop()” method or “del” keyword.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [4, 5, 6],
                   'C': [7, 8, 9]})

# Deleting a column using drop()
df.drop('C', axis=1, inplace=True)
print(df)

# Deleting a column using del keyword
del df['B']
print(df)
   A  B
0  1  4
1  2  5
2  3  6

   A
0  1
1  2
2  3

Rows

You can delete rows using the “drop()” method by specifying the index.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [4, 5, 6],
                   'C': [7, 8, 9]})

# Deleting a row
df.drop(1, inplace=True)

print(df)
   A  B  C
0  1  4  7
2  3  6  9

Renaming

Columns

You can rename columns using the “rename()” method.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [4, 5, 6],
                   'C': [7, 8, 9]})

# Renaming columns
df.rename(columns={'A': 'New_A', 'B': 'New_B'}, inplace=True)

print(df)
   New_A  New_B  C
0      1      4  7
1      2      5  8
2      3      6  9

Rows

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3],
                   'B': [4, 5, 6],
                   'C': [7, 8, 9]})

# Renaming rows
df.rename(index={0: 'row1', 2: 'row3'}, inplace=True)

print(df)
      A  B  C
row1  1  4  7
1     2  5  8
row3  3  6  9

Conditional Selection and Filtering

In Pandas, conditional selection and filtering allow you to extract subsets of data from a DataFrame based on certain conditions. We explore both separately.

Conditional selection

The term refers to selecting specific data, based on some conditions. Conditional operators (such as +, >, &, etc.) play an important role here.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4],
                   'B': [5, 6, 7, 8]})

# Single condition
result = df[df['A'] > 3]
print(result)

# Multiple conditions
result = df[(df['A'] > 2) & (df['B'] < 8)]
print(result)
   A  B
3  4  8

   A  B
2  3  7

Filtering

This term also refers to the process of selecting specific data, but involves some functions. The functions we cover here are “str.contains()“, “isin()“, and “query()“.

  • str.contains() – you can filter rows based on string values using string methods or the “str.contains()” method.
  • isin() – you can filter rows based on whether a value is in a list or array with the “isin()” method.
  • query() – you can use the “query()” method to filter rows using a query expression.
# Sample DataFrame
df = pd.DataFrame({'Name': ['Ben', 'Sam', 'Emma', 'Alice'],
                   'Age': [18, 46, 22, 29]})

# Select rows where 'Name' contains 'a'
result = df[df['Name'].str.contains('a')]
print(result)

# Select rows where 'Age' is in a list of values
result = df[df['Age'].isin([20, 21, 22])]
print(result)

# Select rows where 'Age' is greater than 28
result = df.query('Age > 28')
print(result)
   Name  Age
1   Sam   46
2  Emma   22

   Name  Age
2  Emma   22

    Name  Age
1    Sam   46
3  Alice   29

This is an original data manipulation educational material created by aicorr.com.

Next: Data Operations

by AICorr Team

We are proud to offer our extensive knowledge to you, for free. The AICorr Team puts a lot of effort in researching, testing, and writing the content within the platform (aicorr.com). We hope that you learn and progress forward.