How to handle missing values in time series data when forward fill creates unrealistic patterns? [closed]

Ask Question

Asked 7 days ago

Modified 7 days ago

Viewed 81 times

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed 7 days ago.

Improve this question

Problem Description

I'm working on a time series dataset of sensor readings collected every 5 minutes over 3 months. The dataset has approximately 15% missing values scattered throughout, but some gaps are as large as 2-3 hours.

Current Approach

I'm currently using forward fill (ffill()) to handle missing values:

python

import pandas as pd

df['sensor_reading'] = df['sensor_reading'].fillna(method='ffill')

The Issue

Forward filling works for small gaps (5-10 minutes), but for larger gaps (1-3 hours), it creates unrealistic flat patterns that don't represent actual sensor behavior. This is affecting my downstream analysis and predictions.

What I've Tried

Interpolation - Works better but still struggles with large gaps
Forward fill with limit - Leaves NaN values for large gaps
Mean/median imputation - Doesn't capture temporal patterns

Sample Data

python

timestamp           sensor_reading
2024-01-01 00:00:00    23.5
2024-01-01 00:05:00    23.7
2024-01-01 00:10:00    NaN
2024-01-01 00:15:00    NaN
2024-01-01 00:20:00    NaN
... (20 more missing values)
2024-01-01 02:00:00    25.1

Question

What are the best practices for handling large gaps in time series sensor data? Should I:

Use different imputation methods based on gap size?
Flag large gaps and exclude them from analysis?
Use a predictive model (ARIMA/LSTM) to fill gaps?
Consider the data quality too poor and collect new data?

Environment

Python 3.10
Pandas 2.0.3
Dataset size: ~26,000 rows

Any advice or references to research papers/best practices would be appreciated!

edited Nov 15 at 6:30

asked Nov 15 at 6:05

Vijay Savaliya

175 bronze badges

1

There is no correct general way to address missing data. Typically, It depends on assumptions you have about the data and what is acceptable for what you're trying to achieve.

mqqz
– mqqz

2025-11-15 09:20:04 +00:00
Commented Nov 15 at 9:20
Could toi share the dataset ? In order to analyse its properties.

jlandercy
– jlandercy

2025-11-15 10:17:53 +00:00
Commented Nov 15 at 10:17
Fair feedback! My use case: anomaly detection where gaps might signal problems. Should I flag large gaps instead of imputing? @mqqz

Vijay Savaliya
– Vijay Savaliya

2025-11-17 04:53:10 +00:00
Commented Nov 17 at 4:53
@jlandercy I'd love your input! The dataset is confidential, but I can: 1. Share summary statistics & gap distribution plots 2. Run specific analyses you suggest and share results 3. Create a similar synthetic dataset Which would be most useful for your analysis?

Vijay Savaliya
– Vijay Savaliya

2025-11-17 04:55:18 +00:00
Commented Nov 17 at 4:55
Please don't use generative AI to write or rewrite your Stack Overflow posts. It's not allowed by site policy.

Dan Getz
– Dan Getz

2025-11-17 05:22:27 +00:00
Commented Nov 17 at 5:22

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow