How to use Pandas to split timestamped CSV data into multiple CSVs based on values and continuous time periods

Question

I am trying to analyse a ships AIS data. I have a CSV with ~20,000 rows, with columns for lat / long / speed / time stamp.

I have loaded the data in a pandas data frame, in a Jupyter notebook.

What I want to do is split the CSV into smaller CSVs based on the time stamp and the speed, so I want an individual CSV for each period of time the vessel speed was less than say 2 knots, eg if the vessel transited at 10 knots for 6hrs, then slowed down to 1 knot for a period of 3 hrs, sped back up 10 knots, then slowed down again to 1 knot for a period of 4 hrs, I would want to the output to be two CSVs, one for the 3hr period and one for the 4hr period. This is so I can review these periods individually in my mapping software.

I can filter the data easily to show all the periods where it is <1 knot but I can't break it down to output the continuous periods as separate CSVs / data frames. EDIT

Here is an example of the data

I've tried to show more clearly what I want to achieve here

Please, include some sample data and also code you already have. — ex4
– ex4, Commented Apr 30, 2020 at 10:33
Would help is you can give an example of the CSV. Probably you need to convert the timestamps to a Python datetime object after that if would be straightforward to sort and select based on the time and speed. — Bruno Vermeulen
– Bruno Vermeulen, Commented Apr 30, 2020 at 10:33

jon · Accepted Answer · 2020-04-30 13:04:38Z

0

Here is something to maybe get you started.

First filter out all values that meets the criteria (for example below 2):

df = pd.DataFrame({'speed':[2,1,4,5,4,1,1,1,3,4,5,6], 'time':[4,5,6,7,8,9,10,11,12,13,14,15]})
df_below2 = df[df['speed']<=2].reset_index(drop=True)

Now we need to split the frame if there is too long gap btw values in time. For example:

threshold = 2
df_below2['not_continuous']  = df_below2['time'].diff() > threshold

Distinguish between the groups using cums:

df_below2['group_id'] = df_below2['not_continuous'].cumsum()

From here it should be easy to split the frame based on the group id.

edited Apr 30, 2020 at 13:04

answered Apr 30, 2020 at 10:34

jon

3510 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mhargr Over a year ago

When I follow the example in the link it shows me how to filter based on values and out put as a new CSV, I can't see how it lets me make the multiple CSVs based on the continuous data though?

jon Over a year ago

For continuous data use for example, df2 = df.loc[(df['column_name'] >= A)]. df2 contains all rows in df where 'column_name' weakly exceeds A.

Mhargr Over a year ago

I think this only lets me create a single df where say the speed is < 2 for the whole data set. What I want is each continuous period that the speed is below <2 as a separate CSV. My data set samples the speed every few minute and the data set covers nearly six months of time.

Mhargr Over a year ago

Thank you, I am playing around with this idea, not cracked it yet though :)

Collectives™ on Stack Overflow

How to use Pandas to split timestamped CSV data into multiple CSVs based on values and continuous time periods

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related