Basic Data Operations
Selecting and indexing data
This is a basic data operations tutorial.
Performing basic data operations such as selecting, indexing, and slicing data in Pandas is crucial for data manipulation and analysis tasks. Therefore, understanding these operations is essential for performing more advanced data manipulation tasks efficiently. In this section, we cover very basic operations, such as selecting, indexing, and slicing.
Series
Selecting data from a Pandas Series involves accessing specific elements or subsets of elements based on their index labels, positions, or certain conditions.
Series – select data through indexing
import pandas as pd # Series s = pd.Series([10, 20, 30, 40, 50], index=['A', 'B', 'C', 'D', 'E']) # Selecting a single element print(s[1]) # Selecting multiple elements print(s[[0, 4]])
20 A 10 E 50 dtype: int64
Series – select data through label indexing
import pandas as pd # Series s = pd.Series([10, 20, 30, 40, 50], index=['A', 'B', 'C', 'D', 'E']) # Selecting a single element print(s['B']) # Selecting multiple elements print(s[['A', 'E']])
20 A 10 E 50 dtype: int64
DataFrame
Selecting data from a Pandas DataFrame involves accessing specific rows, columns, or subsets of data based on various criteria such as index labels, positions, or conditions.
DataFrame– select data through label indexing (column name)
import pandas as pd # DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Selecting a single column print(df['A']) # Selecting multiple columns print(df[['A', 'C']]) # Select specific value print(df['A'][1])
0 1 1 2 2 3 Name: A, dtype: int64 A C 0 1 7 1 2 8 2 3 9 2
DataFrame– select data through indexing (loc & iloc methods)
# Selecting rows by index label print(df.loc[0]) # Select row with index label 0 # Selecting rows by position print(df.iloc[0]) # Select first row
A 1 B 4 C 7 Name: 0, dtype: int64 A 1 B 4 C 7 Name: 0, dtype: int64
DataFrame– select rows and columns simultaneously
import pandas as pd # DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Selecting specific rows and columns print(df.loc[[0, 2], ['A', 'B']])
A B 0 1 4 2 3 6
Slicing data
In Pandas, slicing is very straightforward. Let’s explore both series and dataframes. Slicing data from a Pandas Series involves selecting a subset of elements based on their index labels or positions. Whereas, slicing data from a Pandas DataFrame involves selecting a subset of rows and/or columns based on their index labels, positions, or certain conditions.
Series
import pandas as pd # DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Slicing Series by index label print(s['B':'D']) # Slicing Series by position print(s[1:4])
B 20 C 30 D 40 dtype: int64 B 20 C 30 D 40 dtype: int64
DataFrame
import pandas as pd # DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Slicing DataFrame by rows and columns print(df.loc[0:1, 'A':'B']) # Slice rows from 0 to 1 and columns from 'A' to 'B'
A B 0 1 4 1 2 5
This is an original basic data operations educational material created by aicorr.com.