Upsampling means increasing the frequency of the time series data. This is usually done when we need a higher resolution or more frequent observations. Python provides several methods for upsampling time series data, including linear interpolation, nearest neighbor interpolation, and polynomial interpolation.
DataFrame.resample(rule, *args, **kwargs)DataFrame.asfreq(freq, method=None)DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None)
The resample function is a method provided by the pandas library to resample time series data. It is applied on a DataFrame and takes the rule parameter, which specifies the desired frequency for resampling. Additional arguments (*args) and keyword arguments (**kwargs) can be provided to customize the resampling behavior, such as specifying the aggregation method or handling missing values.
The asfreq method is used in conjunction with the resample function to convert the frequency of the time series data. It takes the freq parameter, which specifies the desired frequency string for the output. The optional method parameter allows specifying how to handle any missing values introduced during the resampling process, such as forward filling, backward filling, or interpolation.
In the below example, we have a time series DataFrame with three observations on non−consecutive dates. We convert the 'Date' column to a datetime format and set it as the index. The resample function is used to upsample the data to a daily frequency ('D') using the asfreq method. Finally, the interpolate method with the 'linear' option fills the gaps between the data points using linear interpolation. The DataFrame, df_upsampled, contains the upsampled time series data with interpolated values.
import pandas as pd# Create a sample time series DataFramedata = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'], 'Value': [10, 20, 30]}df = pd.DataFrame(data)df['Date'] = pd.to_datetime(df['Date'])df.set_index('Date', inplace=True)# Upsample the data using linear interpolationdf_upsampled = df.resample('D').asfreq().interpolate(method='linear')# Print the upsampled DataFrameprint(df_upsampled)
ValueDate 2023-06-01 10.0000002023-06-02 15.0000002023-06-03 20.0000002023-06-04 23.3333332023-06-05 26.6666672023-06-06 30.000000
Nearest neighbor interpolation is a simple method that fills the gaps between data points with the nearest available observation. This method can be useful when the time series exhibits abrupt changes or when the order of observations matters. The interpolate method in pandas can be used with the 'nearest' option to perform nearest neighbor interpolation.
In the above example, we use the same original DataFrame as before. After resampling with the 'D' frequency, the interpolate method with the 'nearest' option fills the gaps by copying the nearest available observation. The resulting DataFrame, df_upsampled, now has a daily frequency with the nearest neighbor interpolation.
import pandas as pd# Create a sample time series DataFramedata = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'], 'Value': [10, 20, 30]}df = pd.DataFrame(data)df['Date'] = pd.to_datetime(df['Date'])df.set_index('Date', inplace=True)# Upsample the data using nearest neighbor interpolationdf_upsampled = df.resample('D').asfreq().interpolate(method='nearest')# Print the upsampled DataFrameprint(df_upsampled)
ValueDate 2023-06-01 10.02023-06-02 10.02023-06-03 20.02023-06-04 20.02023-06-05 30.02023-06-06 30.0
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Mean Downsampling
In the below example, we start with a daily time series DataFrame spanning the entire month of June 2023. The resample function with the 'W' frequency downsamples the data to weekly intervals. By applying the mean method, we obtain the average value within each week. The resulting DataFrame, df_downsampled, contains the mean-downsampled time series data.
import pandas as pd# Create a sample time series DataFrame with daily frequencydata = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'), 'Value': range(30)}df = pd.DataFrame(data)df.set_index('Date', inplace=True)# Downsampling using meandf_downsampled = df.resample('W').mean()# Print the downsampled DataFrameprint(df_downsampled)
ValueDate 2023-06-04 1.52023-06-11 7.02023-06-18 14.02023-06-25 21.02023-07-02 27.0
Maximum Downsampling
In the below example, we start with a daily time series DataFrame spanning the entire month of June 2023. The resample function with the 'W' frequency downsamples the data to weekly intervals. By applying the max method, we obtain the Maximum value within each week. The resulting DataFrame, df_downsampled, contains the maximum-downsampled time series data.
import pandas as pd# Create a sample time series DataFrame with daily frequencydata = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'), 'Value': range(30)}df = pd.DataFrame(data)df.set_index('Date', inplace=True)# Downsampling using meandf_downsampled = df.resample('W').max()# Print the downsampled DataFrameprint(df_downsampled)
ValueDate 2023-06-04 32023-06-11 102023-06-18 172023-06-25 242023-07-02 29