0

I want to extract specific information from [this csv file][1].

I need make a list of days and give an overview.

2
  • The second thing you're trying to do (count foggy days) is easy enough, but the first thing that you're trying to do I'm confused about. Will you please elaborate? I'll update my answer below as soon as you clarify. Commented Nov 24, 2021 at 23:46
  • check now. See if it's what you want. Commented Nov 25, 2021 at 0:03

3 Answers 3

1

You're looking for DataFrame.resample. Based on a specific column, it will group the rows of the dataframe by a specific time interval.

First you need to do this, if you haven't already:

data['Date/Time'] = pd.to_datetime(data['Date/Time'])

Get the lowest 5 days of visibility:

>>> df.resample(rule='D', on='Date/Time')['Visibility (km)'].mean().nsmallest(5)
Date/Time
2012-03-01    2.791667
2012-03-14    5.350000
2012-12-27    6.104167
2012-01-17    6.433333
2012-02-01    6.795833
Name: Visibility (km), dtype: float64

Basically what that does is this:

  1. Groups all the rows by day
  2. Converts each group to the average value of all the Visibility (km) items for that day
  3. Returns the 5 smallest

Count the number of foggy days

>>> df.resample(rule='D', on='Date/Time').apply(lambda x: x['Weather'].str.contains('Fog').any()).sum()
78

Basically what that does is this:

  1. Groups all the rows by day
  2. For each day, adds a True if any row inside that day contains 'Fog' in the Weather column, False otherwise
  3. Counts how many True's there were, and thus the number of foggy days.
Sign up to request clarification or add additional context in comments.

6 Comments

Ah, I see! Let me fix that...
@John check again.
Oh sorry, yes, I can do that for you. Yeah, df.sample is really useful.
I think I've finally understood you @John :) Take another look at my answer.
Great!! I'm glad it helped you! (Remember to accept the answer :)
|
0

This will get you an array of all unique foggy days. you can use the shape method to get its dimension

 df[df["Weather"].apply(lambda x : "Fog" in x)]["Date/Time"].unique()

Comments

0

I need make a list of days with lowest visibility and give an overview of other time parameters for those days in tabular form.

Since your Date/Time column represents a particular hour, you'll need to do some grouping to get the minimum visibility for a particular day. The following will find the 5 least-visible days.

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Group on the new "Date" column and get the minimum values of
# each column for each group.
>>> min_by_day = data.groupby("Date").min()

# Now we can use nsmallest, since 1 row == 1 day in min_by_day.
# Since `nsmallest` returns a pandas.Series with "Date" as the index,
# we have to use `.index` to pull the date objects from the result.
>>> least_visible_days = min_by_day.nsmallest(5, "Visibility (km)").index

Then you can limit your original dataset to the least-visible days with

data[data["Date"].isin(least_visible_days)]

I also need the total number of foggy days.

We can use the extracted date in this case too:

# Extract the date from the "Date/Time" column
>>> data["Date"] = pandas.to_datetime(data["Date/Time"]).dt.date

# Filter on hours which have foggy weather
>>> foggy = data[data["Weather"].str.contains("Fog")]

# Count number of unique days
>>> len(foggy["Date"].unique())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.