2

I have a pandas dataframe that looks like so:

   datetime         Online   TEST
61 2018-03-03       True   False
62 2018-03-04       True   False
63 2018-03-05       True   False
64 2018-03-06       True   False
65 2018-03-07       True   False
66 2018-03-08       True   False
67 2018-03-09       True   False
68 2018-03-10       True   False
69 2018-03-11       False  False
70 2018-03-12       False  False

I need to check that for each False in the TEST column, that within a date range of 7 days that there is a False in the Online column. For example, on 2018-03-03, since TEST is False, I would want to check all plus or minus 7 days (ie plus or minus timedelta(days = 7)) for False values in the Online column. So since there are no False Online values within a 7 day time frame, then we would return a False. On the other hand, consider the date 2018-03-09, where Online is True and TEST is False. Since there is a False in Online on the day 2018-03-11, I need to return a boolean True value saying that there was a False within my 7 day time range.

I can achieve this using some slow and ugly looping mechanisms (ie go through each row using DataFrame.iterrows(), check if TEST is false, then pull the time window of plus or minus 7 days to see if Online also has a corresponding False value. But I would ideally like to have something snazzier and faster. For a visual, this is what I need my final dataframe to look like:

   datetime         Online   TEST   Check
61 2018-03-03       True   False    False
62 2018-03-04       True   False    True
63 2018-03-05       True   False    True
64 2018-03-06       True   False    True
65 2018-03-07       True   False    True
66 2018-03-08       True   False    True
67 2018-03-09       True   False    True
68 2018-03-10       True   False    True
69 2018-03-11       False  False    True
70 2018-03-12       False  False    True

Any ideas out there? Thanks in advance!

6
  • 1
    Is datetime field always in sorted order? Commented Apr 28, 2020 at 19:39
  • In my case yes, I sorted it before creating the current dataframe. Commented Apr 28, 2020 at 19:40
  • 1
    Is it expected that all dates are present? Do you want +/- 7 rows or do you want +/- 7 days? If all days are present then those are the same. Just asking for clarification. Commented Apr 28, 2020 at 19:58
  • All dates are present in the full dataframe, so technically yes +/- 7 days is equivalent to +/- 7 rows in my specific case. Commented Apr 28, 2020 at 20:00
  • 1
    See if this does what you want df.Online.rolling(15, center=True, min_periods=1).apply(np.nanmin).eq(0) Commented Apr 28, 2020 at 20:42

1 Answer 1

1

Building upon great @piRSquared comments (I didn't even know about the rolling method, it seems very useful!), you can use

check = ~(df.TEST + df.Online.rolling(15, center=True, min_periods=1).apply(np.prod).eq(1))

The second summand creates a Series object in which every element is a boolean indicating if there is not any False value in a window of size 15; this is achieved by multiplying (NumPy's prod function) all the values inside this rolling window.

The sum (with the logical inverse operator ~) is what compares booleans so we only get True in the Check series if there are two False in both columns (element-wise, of course).

Hope it helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.