Pandas

Time Series Analysis

Introduction to time series data

This is a time series analysis tutorial.

Time series data represents observations or measurements taken at different points in time, typically at regular intervals. This could be anything from stock prices, weather data, economic indicators, sensor readings, or even social media activity over time. Pandas offers powerful tools for handling and analysing time series data efficiently. Let’s dive into the different techniques.

First, we create a sample time series data. We do this through the Pandas’ “date_range()” method. The attribute period refers to the number of periods. Freq deals with offset aliases (in our scenario, D stands for calendar day frequency).

And then, we convert the time series object to a Pandas dataframe.

import pandas as pd

# Sample time series with timestamps
dates = pd.date_range('2024-01-01', periods=10, freq='D')
print(dates)

# DataFrame with random data and timestamps as index
data = pd.DataFrame({'value': range(10)}, index=dates)
print(data)

The following is the outcome of both time series and dataframe objects.

# Time Series Data
DatetimeIndex(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
               '2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08',
               '2024-01-09', '2024-01-10'],
              dtype='datetime64[ns]', freq='D')

# Dataframe
            value
2024-01-01      0
2024-01-02      1
2024-01-03      2
2024-01-04      3
2024-01-05      4
2024-01-06      5
2024-01-07      6
2024-01-08      7
2024-01-09      8
2024-01-10      9

Now, let’s see how to access time series data. Below are a few common methods for accessing data.

# Specific date
print(data.loc['2024-01-03'])

# Within a range of dates
print(data.loc['2024-01-03':'2024-01-07'])

# Specific month
print(data.loc['2024-01'])
value    2
Name: 2024-01-03 00:00:00, dtype: int64

            value
2024-01-03      2
2024-01-04      3
2024-01-05      4
2024-01-06      5
2024-01-07      6

            value
2024-01-01      0
2024-01-02      1
2024-01-03      2
2024-01-04      3
2024-01-05      4
2024-01-06      5
2024-01-07      6
2024-01-08      7
2024-01-09      8
2024-01-10      9

Resampling and frequency conversion

Resampling involves changing the frequency of the time series observations. We can upsample (increase frequency) or downsample (decrease frequency) the data.

# Resample to monthly frequency
monthly_data = data.resample('M').sum()
print(monthly_data)

# Resample to monthly frequency
monthly_data = data.resample('M').mean()
print(monthly_data)
            value
2024-01-31     45

            value
2024-01-31    4.5

Time shifting and lagging

The time shifting and lagging methods offer time series data manipulation. The techniques shift the data forward or backwards.

Time shifting

This method involves shifting the entire time series by a certain offset (forward or backwards). First, we create a random sample time series data.

import pandas as pd

# Sample time series data
dates = pd.date_range('2024-01-01', periods=5, freq='D')
data = pd.Series([10, 20, 30, 40, 50], index=dates)

print(data)
2024-01-01    10
2024-01-02    20
2024-01-03    30
2024-01-04    40
2024-01-05    50
Freq: D, dtype: int64

Then, we shift the data forward as well as backwards. We shift the data by 1 period.

# Forward shift
shifted_forward = data.shift(periods=1) 
print(shifted_forward)

# Backward shift
shifted_backward = data.shift(periods=-1)
print(shifted_backward)
2024-01-01     NaN
2024-01-02    10.0
2024-01-03    20.0
2024-01-04    30.0
2024-01-05    40.0
Freq: D, dtype: float64

2024-01-01    20.0
2024-01-02    30.0
2024-01-03    40.0
2024-01-04    50.0
2024-01-05     NaN
Freq: D, dtype: float64

Lagging

This method involves shifting the data backwards in time, often to create lag features for time series analysis. Let’s create a random sampling.

import pandas as pd

# Sample time series data
dates = pd.date_range('2024-01-01', periods=5, freq='D')
data = pd.Series([10, 20, 30, 40, 50], index=dates)

print(data)
2024-01-01    10
2024-01-02    20
2024-01-03    30
2024-01-04    40
2024-01-05    50
Freq: D, dtype: int64

Now, we can perform lagging onto the data.

# Lagging
lagged_data = data.shift(periods=1) 
print(lagged_data)
2024-01-01     NaN
2024-01-02    10.0
2024-01-03    20.0
2024-01-04    30.0
2024-01-05    40.0
Freq: D, dtype: float64

Rolling window functions

Rolling window functions in Pandas allow you to perform calculations over a sliding window of data. These functions are particularly useful for tasks like moving averages, smoothing noisy data, or computing rolling statistics in time series analysis.

We cover the calculation of Simple Moving Average (SMA), Exponential Moving Average (EMA), and Rolling Statistics. Let’s dive into each one of them.

Simple moving average (SMA)

The Simple Moving Average is the average of a fixed window of data. First, we create a random sample data. And then, we calculate the SMA.

import pandas as pd

# Sample time series data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
data = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], index=dates)

# Simple moving average (window size of 3)
sma = data.rolling(window=3).mean()
print(sma)
2024-01-01     NaN
2024-01-02     NaN
2024-01-03    20.0
2024-01-04    30.0
2024-01-05    40.0
2024-01-06    50.0
2024-01-07    60.0
2024-01-08    70.0
2024-01-09    80.0
2024-01-10    90.0
Freq: D, dtype: float64

Exponential moving average (SMA)

The Exponential Moving Average gives more weight to recent data points while calculating the average. We create a random sample data. Afterwards, we calculate the EMA.

import pandas as pd

# Sample time series data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
data = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], index=dates)

# Exponential moving average (with a span of 3)
ema = data.ewm(span=3).mean()
print(ema)
2024-01-01    10.000000
2024-01-02    16.666667
2024-01-03    24.285714
2024-01-04    32.666667
2024-01-05    41.612903
2024-01-06    50.952381
2024-01-07    60.551181
2024-01-08    70.313725
2024-01-09    80.176125
2024-01-10    90.097752
Freq: D, dtype: float64

Rolling statistics

Pandas allows the computation of various rolling statistics such as rolling sum, rolling max, rolling min, and so on. Finally, we create a random sample data, and the we perform each rolling statistics technique onto the data.

import pandas as pd

# Sample time series data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
data = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], index=dates)

# Sum with a (window size of 3)
rolling_sum = data.rolling(window=3).sum()
print(rolling_sum)

# Maximum with a (window size of 3)
rolling_max = data.rolling(window=3).max()
print(rolling_max)

# Minimum with a (window size of 3)
rolling_min = data.rolling(window=3).min()
print(rolling_min)
2024-01-01      NaN
2024-01-02      NaN
2024-01-03     60.0
2024-01-04     90.0
2024-01-05    120.0
2024-01-06    150.0
2024-01-07    180.0
2024-01-08    210.0
2024-01-09    240.0
2024-01-10    270.0
Freq: D, dtype: float64

2024-01-01      NaN
2024-01-02      NaN
2024-01-03     30.0
2024-01-04     40.0
2024-01-05     50.0
2024-01-06     60.0
2024-01-07     70.0
2024-01-08     80.0
2024-01-09     90.0
2024-01-10    100.0
Freq: D, dtype: float64

2024-01-01     NaN
2024-01-02     NaN
2024-01-03    10.0
2024-01-04    20.0
2024-01-05    30.0
2024-01-06    40.0
2024-01-07    50.0
2024-01-08    60.0
2024-01-09    70.0
2024-01-10    80.0
Freq: D, dtype: float64

This is an original time series analysis educational material created by aicorr.com.

Next: Working with Text Data

by AICorr Team

We are proud to offer our extensive knowledge to you, for free. The AICorr Team puts a lot of effort in researching, testing, and writing the content within the platform (aicorr.com). We hope that you learn and progress forward.