0

I am trying to analyse a ships AIS data. I have a CSV with ~20,000 rows, with columns for lat / long / speed / time stamp.

I have loaded the data in a pandas data frame, in a Jupyter notebook.

What I want to do is split the CSV into smaller CSVs based on the time stamp and the speed, so I want an individual CSV for each period of time the vessel speed was less than say 2 knots, eg if the vessel transited at 10 knots for 6hrs, then slowed down to 1 knot for a period of 3 hrs, sped back up 10 knots, then slowed down again to 1 knot for a period of 4 hrs, I would want to the output to be two CSVs, one for the 3hr period and one for the 4hr period. This is so I can review these periods individually in my mapping software.

I can filter the data easily to show all the periods where it is <1 knot but I can't break it down to output the continuous periods as separate CSVs / data frames. EDIT

Here is an example of the data

I've tried to show more clearly what I want to achieve here

3
  • Please, include some sample data and also code you already have. Commented Apr 30, 2020 at 10:33
  • Would help is you can give an example of the CSV. Probably you need to convert the timestamps to a Python datetime object after that if would be straightforward to sort and select based on the time and speed. Commented Apr 30, 2020 at 10:33
  • I've added some of the data I'm using, thanks for helping Commented Apr 30, 2020 at 11:04

1 Answer 1

0

Here is something to maybe get you started.

First filter out all values that meets the criteria (for example below 2):

df = pd.DataFrame({'speed':[2,1,4,5,4,1,1,1,3,4,5,6], 'time':[4,5,6,7,8,9,10,11,12,13,14,15]})
df_below2 = df[df['speed']<=2].reset_index(drop=True)

Now we need to split the frame if there is too long gap btw values in time. For example:

threshold = 2
df_below2['not_continuous']  = df_below2['time'].diff() > threshold

Distinguish between the groups using cums:

df_below2['group_id'] = df_below2['not_continuous'].cumsum()

From here it should be easy to split the frame based on the group id.

Sign up to request clarification or add additional context in comments.

4 Comments

When I follow the example in the link it shows me how to filter based on values and out put as a new CSV, I can't see how it lets me make the multiple CSVs based on the continuous data though?
For continuous data use for example, df2 = df.loc[(df['column_name'] >= A)]. df2 contains all rows in df where 'column_name' weakly exceeds A.
I think this only lets me create a single df where say the speed is < 2 for the whole data set. What I want is each continuous period that the speed is below <2 as a separate CSV. My data set samples the speed every few minute and the data set covers nearly six months of time.
Thank you, I am playing around with this idea, not cracked it yet though :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.