Skip to main content

Command Palette

Search for a command to run...

Pandas Time Series Analysis - Part 2: date_range

Updated
6 min read
A

I am a Django developer driven by a deep passion for coding and a relentless pursuit of problem-solving. Over the years, I've cultivated my skills in web development, specializing in crafting powerful and scalable applications using the Django framework.

Every project is an exciting challenge for me, and I thrive on unraveling complex problems to create elegant and efficient solutions. My commitment to staying at the forefront of industry trends and technologies reflects my dedication to continuous learning.

Whether it's building innovative features from scratch or optimizing existing code, my enthusiasm for coding is at the core of everything I do. I find joy in the journey of creating impactful and user-friendly applications, leveraging the full potential of Django in the process.

Hey everyone! In this post, we’re continuing our journey into time series analysis with Pandas by exploring the powerful date_range function. If you haven’t already checked out our previous post on DatetimeIndex, I recommend giving that a look first. It sets the foundation for what we’re doing here and helps you understand how date-related operations can transform your data analysis workflow.

So, let’s dive in!


Why We Need date_range

Imagine you have a CSV file with stock prices that somehow doesn’t include any date column. That’s a problem—time series data is basically meaningless if we don’t know when each record happened. Pandas provides date_range to help us generate a sequence of dates that we can then merge with our data. This is super handy, whether you’re dealing with missing weekends, creating synthetic test data, or just ensuring your dataset has a proper timeline.

For example, let’s say we have Apple stock prices for June 2017, but there's no date column. By using date_range, we can easily generate a list of dates corresponding to business days in that month and then set those as our DataFrame's index. This is way better than having some arbitrary integer index that doesn’t convey any time information.


Setting up the Jupyter Notebook

Before starting, you might want to launch your Jupyter notebook:

jupyter notebook

Create a new notebook, and let's import the required libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

(This sets us up to handle data with Pandas and plot with Matplotlib.)


Scenario: Data Missing Dates

Let’s say we have a CSV file (stock_prices_no_dates.csv) that looks something like this (no Date column):

Open,High,Low,Close,Volume
153.63,155.45,153.06,155.07,26000000
154.34,156.10,153.90,155.96,20000000
...

We want to add a date column for the entire month of June 2017, excluding weekends. Here’s how we load the CSV into a DataFrame:

df = pd.read_csv("stock_prices_no_dates.csv")
print(df.head())

You’ll notice it just has columns like Open, High, Close, Volume, and a default 0-based index. That’s not much help for time series analysis.


Generating Dates with date_range

Basic Usage

We can use date_range to create a sequence of dates:

pd.date_range(start='2017-06-01', 
              end='2017-06-30', 
              freq='B')  # 'B' stands for Business Day
  • start: The first date in the range.

  • end: The last date in the range.

  • freq: How often you want to generate the timestamps. 'B' means every weekday (excluding weekends).

If you print out this range, you’ll see:

DatetimeIndex(['2017-06-01', '2017-06-02', '2017-06-05', '2017-06-06', ...],
              dtype='datetime64[ns]', freq='B')

Notice that June 3rd and 4th (Saturday and Sunday) are missing, which is exactly what freq='B' is designed for. This is awesome because we usually only care about business days for stock data.

Setting the Date as the DataFrame Index

Now that we have our date range, we can set it as the index for the DataFrame:

date_index = pd.date_range(start='2017-06-01', 
                           end='2017-06-30', 
                           freq='B')

df['Date'] = date_index  # Insert the dates into a column first
df.set_index('Date', inplace=True)  # Set that column as the index
print(df.head())

Or, you can set the index directly while assigning:

df.index = date_index

From here on, the DataFrame will know it’s dealing with dates, which means it can handle all the time series magic we know and love (like slicing, plotting, and resampling).


Visualizing Time Series Data

One huge benefit of having a datetime index is easier plotting:

%matplotlib inline  # Makes plots appear inline in Jupyter notebooks
df['Close'].plot(title="Closing Prices for June 2017")
plt.show()

Now the x-axis will be labeled with actual dates instead of just integers. This makes spotting trends or patterns much more natural.


Partial Date Selection

Sometimes you only need a portion of your data. Suppose you just want to see the first 10 days in June:

df_partial = df.loc['2017-06-01':'2017-06-10']
print(df_partial)

Or if you only need the closing prices for those days, you could do:

df_partial_close = df.loc['2017-06-01':'2017-06-10', 'Close']
print(df_partial_close)

This is so much more intuitive than trying to slice by integer index positions.


Filling in Missing Dates (Weekends, Holidays, Etc.)

Using asfreq with "Padding"

Sometimes you do want weekend or holiday data in your time series, even if there were no actual trades on those days. For example, maybe you want to see all calendar days and “carry forward” the last known price for days with no trading activity. Pandas can do that with asfreq:

df_daily = df.asfreq('D', method='pad')  # 'D' includes every day
print(df_daily.head(10))
  • freq='D': Daily frequency, including weekends.

  • method='pad': Also known as forward fill, it carries the last known value forward to fill the missing days.

This is useful for many analysis techniques where continuous data (no missing dates) is required.

Other Frequencies

asfreq and date_range both support a ton of frequencies:

  • W: Weekly

  • H: Hourly

  • T: Minute (T stands for “minute” in Pandas)

  • S: Secondly

  • and so on...

Try them out if you ever need to resample your data to different time intervals!


Generating Time Series with periods

In some scenarios, you may not know your end date, but you know how many periods (time steps) you need:

rng = pd.date_range(start='2017-01-01', periods=10, freq='D')
print(rng)

This will create 10 consecutive days starting from January 1, 2017.

Alternatively, if you want 24 points with hourly frequency:

rng_hourly = pd.date_range(start='2017-01-01', periods=24, freq='H')
print(rng_hourly)

Creating Synthetic Time Series for Testing

A common need is generating fake data for test purposes. You can pair Pandas date_range with NumPy’s random number functions:

# Generate an hourly date range for 3 days, total 72 periods
rng = pd.date_range(start='2017-01-01', periods=72, freq='H')

# Create random data
random_values = np.random.randint(low=1, high=10, size=len(rng))

# Create a series with the date range as index
ts = pd.Series(data=random_values, index=rng)
print(ts.head(10))

Now you have a random time series you can use for demos, prototype analyses, or unit tests. You can even resample it to daily or weekly data to see how the values aggregate over time.


Handling Holidays

The default freq='B' (business day frequency) only excludes weekends. It doesn’t know about country-specific holidays (e.g., July 4th in the USA or other national holidays). For that, Pandas offers “Custom Business Day” or “Custom Holiday Calendars,” but that’s a deeper topic we’ll dive into another time. Just keep in mind that if your data needs to skip certain holidays, you’ll have to set up a custom calendar.


Real-World Use Cases

  1. Stock Market Analysis: If your trading data is missing specific dates, date_range helps patch things up with accurate business-day frequencies.

  2. IoT Sensor Data: When sensors go offline or skip transmissions, date_range combined with asfreq can fill the gaps, ensuring you have a continuous timeline.

  3. Forecasting: Many forecasting models require evenly spaced data. Generating a uniform time series can be crucial when preparing your dataset.

  4. Data Testing & Prototyping: Quickly spin up synthetic data for practice, demos, or testing by pairing date_range with np.random.

More from this blog

P

Python is Love ❤️

42 posts