Series and DataFrame
Introduction to Series and DataFrame objects
This is a series and dataframe tutorial.
Understanding these structures is crucial as they form the backbone of data manipulation and analysis in Pandas. We cover the main Pandas objects, series and dataframe. Both terms refer to different objects with distinct characteristics. Let’s dive into each one of them.
Series
A Series is a one-dimensional labelled array that can hold data of any type (integer, float, string, etc.). It is similar to a one-dimensional NumPy array or a Python list but with additional capabilities, such as custom index labels. Each element in a Series has a corresponding label or index, which allows for easy and efficient data access. Series can be created from various data sources, including Python lists, NumPy arrays, dictionaries, or scalar values.
Index | Column 1 |
0 | datapoint |
1 | datapoint |
2 | datapoint |
DataFrame
A DataFrame is a two-dimensional labelled data structure resembling a spreadsheet or SQL table. It consists of rows and columns, where each column can hold data of different types. Like a Series, a DataFrame also has row and column labels, enabling easy data manipulation and indexing. DataFrames are highly versatile, which means that their creation occurs through various data sources, including dictionaries, NumPy arrays, lists of dictionaries, other DataFrames, or external files such as CSV, Excel, or SQL databases.
Index | Column 1 | Column 2 |
0 | datapoint | datapoint |
1 | datapoint | datapoint |
2 | datapoint | datapoint |
Create Series and DataFrame
This section covers a very basic, but fundamental, understanding of creating series and dataframes in Pandas.
Create Series
Creating a Series in Pandas is straightforward. You can create a Series from a Python list, NumPy array, dictionary, or even a scalar value.
From a list
import pandas as pd # Series from a Python list data = [10, 20, 30, 40, 50] s = pd.Series(data) print(s)
0 10 1 20 2 30 3 40 4 50 dtype: int64
From a dictionary
import pandas as pd # dictionary data = {'A': 10, 'B': 20, 'C': 30, 'D': 40, 'E': 50} # Series from a dictionary s = pd.Series(data) print(s)
A 10 B 20 C 30 D 40 E 50 dtype: int64
With a NumPy array
import pandas as pd import numpy as np # NumPy array data = np.array([10, 20, 30, 40, 50]) # Series from the NumPy array s = pd.Series(data) print(s)
0 10 1 20 2 30 3 40 4 50 dtype: int32
From a scalar
import pandas as pd # Series from a scalar value (repeated 5 times) s = pd.Series(5, index=['A', 'B', 'C', 'D', 'E']) print(s)
A 5 B 5 C 5 D 5 E 5 dtype: int64
Create DataFrame
Creating a DataFrame in Pandas can happen through using various data sources, including Python dictionaries, lists of dictionaries, NumPy arrays, or by reading data from external files such as CSV, Excel, or SQL databases.
From a dictionary
import pandas as pd # dictionary with column names as keys and lists as values data = {'Name': ['Charlie', 'Martin', 'Ben', 'Sam'], 'Age': [26, 18, 42, 19], 'Salary': [40000, 25500, 67000, 29000]} # DataFrame from the dictionary df = pd.DataFrame(data) print(df)
Name Age Salary 0 Charlie 26 40000 1 Martin 18 25500 2 Ben 42 67000 3 Sam 19 29000
From a list of dictionaries
import pandas as pd # a list of dictionaries data = [{'Name': 'Martin', 'Age': 23, 'Salary': 40000}, {'Name': 'Ben', 'Age': 26, 'Salary': 20000}, {'Name': 'Charlie', 'Age': 42, 'Salary': 80000}, {'Name': 'Sam', 'Age': 18, 'Salary': 23000}] # DataFrame from the list of dictionaries df = pd.DataFrame(data) print(df)
Name Age Salary 0 Martin 23 40000 1 Ben 26 20000 2 Charlie 42 80000 3 Sam 18 23000
With a NumPy array
import pandas as pd import numpy as np # NumPy array data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # DataFrame from the NumPy array df = pd.DataFrame(data, columns=['A', 'B', 'C']) print(df)
A B C 0 1 2 3 1 4 5 6 2 7 8 9
From an external file
import pandas as pd # Read data from a CSV file df = pd.read_csv('data.csv') # Read data from an Excel file df = pd.read_excel('data.xlsx')
This is an original series and dataframe educational material created by aicorr.com.
Next: Basic Data Operations