Pandas

Series and DataFrame

Introduction to Series and DataFrame objects

This is a series and dataframe tutorial.

Understanding these structures is crucial as they form the backbone of data manipulation and analysis in Pandas. We cover the main Pandas objects, series and dataframe. Both terms refer to different objects with distinct characteristics. Let’s dive into each one of them.

Series

A Series is a one-dimensional labelled array that can hold data of any type (integer, float, string, etc.). It is similar to a one-dimensional NumPy array or a Python list but with additional capabilities, such as custom index labels. Each element in a Series has a corresponding label or index, which allows for easy and efficient data access. Series can be created from various data sources, including Python lists, NumPy arrays, dictionaries, or scalar values.

IndexColumn 1
0datapoint
1datapoint
2datapoint

DataFrame

A DataFrame is a two-dimensional labelled data structure resembling a spreadsheet or SQL table. It consists of rows and columns, where each column can hold data of different types. Like a Series, a DataFrame also has row and column labels, enabling easy data manipulation and indexing. DataFrames are highly versatile, which means that their creation occurs through various data sources, including dictionaries, NumPy arrays, lists of dictionaries, other DataFrames, or external files such as CSV, Excel, or SQL databases.

IndexColumn 1Column 2
0datapointdatapoint
1datapointdatapoint
2datapointdatapoint

Create Series and DataFrame

This section covers a very basic, but fundamental, understanding of creating series and dataframes in Pandas.

Create Series

Creating a Series in Pandas is straightforward. You can create a Series from a Python list, NumPy array, dictionary, or even a scalar value.

From a list

import pandas as pd

# Series from a Python list
data = [10, 20, 30, 40, 50]
s = pd.Series(data)

print(s)
0    10
1    20
2    30
3    40
4    50
dtype: int64

From a dictionary

import pandas as pd

# dictionary
data = {'A': 10, 'B': 20, 'C': 30, 'D': 40, 'E': 50}

# Series from a dictionary
s = pd.Series(data)

print(s)
A    10
B    20
C    30
D    40
E    50
dtype: int64

With a NumPy array

import pandas as pd
import numpy as np

# NumPy array
data = np.array([10, 20, 30, 40, 50])

# Series from the NumPy array
s = pd.Series(data)

print(s)
0    10
1    20
2    30
3    40
4    50
dtype: int32

From a scalar

import pandas as pd

# Series from a scalar value (repeated 5 times)
s = pd.Series(5, index=['A', 'B', 'C', 'D', 'E'])

print(s)
A    5
B    5
C    5
D    5
E    5
dtype: int64

Create DataFrame

Creating a DataFrame in Pandas can happen through using various data sources, including Python dictionaries, lists of dictionaries, NumPy arrays, or by reading data from external files such as CSV, Excel, or SQL databases.

From a dictionary

import pandas as pd

# dictionary with column names as keys and lists as values
data = {'Name': ['Charlie', 'Martin', 'Ben', 'Sam'],
        'Age': [26, 18, 42, 19],
        'Salary': [40000, 25500, 67000, 29000]}

# DataFrame from the dictionary
df = pd.DataFrame(data)

print(df)
      Name  Age  Salary
0  Charlie   26   40000
1   Martin   18   25500
2      Ben   42   67000
3      Sam   19   29000

From a list of dictionaries

import pandas as pd

# a list of dictionaries
data = [{'Name': 'Martin', 'Age': 23, 'Salary': 40000},
        {'Name': 'Ben', 'Age': 26, 'Salary': 20000},
        {'Name': 'Charlie', 'Age': 42, 'Salary': 80000},
        {'Name': 'Sam', 'Age': 18, 'Salary': 23000}]

# DataFrame from the list of dictionaries
df = pd.DataFrame(data)

print(df)
      Name  Age  Salary
0   Martin   23   40000
1      Ben   26   20000
2  Charlie   42   80000
3      Sam   18   23000

With a NumPy array

import pandas as pd
import numpy as np

# NumPy array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# DataFrame from the NumPy array
df = pd.DataFrame(data, columns=['A', 'B', 'C'])

print(df)
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

From an external file

import pandas as pd

# Read data from a CSV file
df = pd.read_csv('data.csv')

# Read data from an Excel file
df = pd.read_excel('data.xlsx')

This is an original series and dataframe educational material created by aicorr.com.

Next: Basic Data Operations

by AICorr Team

We are proud to offer our extensive knowledge to you, for free. The AICorr Team puts a lot of effort in researching, testing, and writing the content within the platform (aicorr.com). We hope that you learn and progress forward.