1

I have a data frame that has a bunch of data in it. I have a column that is a datetime64. The column looks like (this is a small chunk. I have a total of 650 rows):

Time
0      2019-09-04 23:44:52
1      2019-09-02 20:38:40
2      2019-09-02 18:06:50
3      2019-09-04 20:18:06
4      2019-09-03 00:32:32

I then have a list with some benchmark days that looks like:

2019-08-30 23:59:59
2019-08-31 23:59:59
2019-09-01 23:59:59
2019-09-02 23:59:59
2019-09-03 23:59:59
2019-09-04 23:59:59
2019-09-05 23:59:59
2019-09-06 23:59:59
2019-09-07 23:59:59
2019-09-08 23:59:59
2019-09-09 23:59:59

I want to check to see how many days in my column lay in between each checkmark on my list. I could not get things working using .count or .sum. I also tried just finding the number of entries in the column that were greater than or equal each item in my list but it is but I get 650 for all of them. Not sure what I am doing wrong. My last method looked something like:

count = []
i =0
for val in dates:
   value = (df[column_name] >= val).count()
   counts[i] = value
   i = i+1

I would rather do something like

count = []
i =0
while i < len(dates)-1:
   value = (df['Time'] >= dates[i] && df['Time'] < dates[i+1]).count()
   counts[i] = value
   i = i+1
value = (df['Time'] >= dates[i])
counts[i] = value

But I am not sure how to accomplish this. Some help would be greatly appreciated.

I would like to end up with a list counts that contains the count that lies in the ranges using the dates inside dates.

Just looking at the days I provided the output would be something like: Days between

2019-08-30 23:59:59 - 2019-08-31 23:59:59 = 0
2019-08-31 23:59:59 - 2019-09-01 23:59:59 = 0
2019-09-01 23:59:59 - 2019-09-02 23:59:59 = 0
2019-09-02 23:59:59 - 2019-09-03 23:59:59 = 2
2019-09-03 23:59:59 - 2019-09-04 23:59:59 = 1
2019-09-04 23:59:59 - 2019-09-05 23:59:59 = 2
2019-09-05 23:59:59 - 2019-09-06 23:59:59 = 0
2019-09-06 23:59:59 - 2019-09-07 23:59:59 = 0
2019-09-07 23:59:59 - 2019-09-08 23:59:59 = 0
2019-09-08 23:59:59 - 2019-09-09 23:59:59 = 0
greater than 2019-09-09 23:59:59 = 0

counts = [0,0,0,2,1,2,0,0,0,0]

4
  • mention desired output Commented Nov 20, 2019 at 4:33
  • I added to the bottom. Sorry if it was not clear at first Commented Nov 20, 2019 at 4:37
  • Still not clear, show an example. I thought it was getting the number of days between two dates? So in the sample you gave, on the first loop, you want to get the difference between 2019-09-04 23:44:52 and 2019-08-30 23:59:59? Commented Nov 20, 2019 at 4:55
  • I added something at the bottom Commented Nov 20, 2019 at 5:12

1 Answer 1

1

Assuming that this is df:

dates
2019-08-30 23:59:59
2019-08-31 23:59:59
2019-09-01 23:59:59
2019-09-02 23:59:59
2019-09-03 23:59:59
2019-09-04 23:59:59
2019-09-05 23:59:59
2019-09-06 23:59:59
2019-09-07 23:59:59
2019-09-08 23:59:59
2019-09-09 23:59:59

And this is df2

Time
2019-09-04 23:44:52
2019-09-02 20:38:40
2019-09-02 18:06:50
2019-09-04 20:18:06
2019-09-03 00:32:32

Assuming that the dates are in datetime64, this is the code for counting dates. Basically it holds each date ranges per loop, then checks each time log on those date ranges if they fit in it.

holder = ''
day_dif = []

for index,row in df.iterrows():
    i = 0
    if holder == '':
        holder = row['dates']
        continue

    for time in df2['Time']:

        first_date = holder
        end_date = row['dates']

        if (time >= first_date) & (time <= end_date):
            i += 1

    day_dif.append(i)
    print(i)
    holder = row['dates']

Printed Output:

2019-08-30 23:59:00 2019-08-31 23:59:00 0
2019-08-31 23:59:00 2019-09-01 23:59:00 0
2019-09-01 23:59:00 2019-09-02 23:59:00 2
2019-09-02 23:59:00 2019-09-03 23:59:00 1
2019-09-03 23:59:00 2019-09-04 23:59:00 2
2019-09-04 23:59:00 2019-09-05 23:59:00 0
2019-09-05 23:59:00 2019-09-06 23:59:00 0
2019-09-06 23:59:00 2019-09-07 23:59:00 0
2019-09-07 23:59:00 2019-09-08 23:59:00 0
2019-09-08 23:59:00 2019-09-09 23:59:00 0

List output of day_dif list:

[0, 0, 2, 1, 2, 0, 0, 0, 0, 0]
Sign up to request clarification or add additional context in comments.

5 Comments

So the dataframe contains days and times that are in my data. The list contains all possible days. I basically want to check how many days in the dataframe lie in each 24 hours periods. So in my output example I was showing each range. The courns array is what should be stored after the comparisons
@newwebdev22, I've updated my solution based from my understanding of your explanation. Hope its what you're looking for.
With a few tweaks, I was able to get this working for me! thank you!
Glad I could help @newwebdev22. Do you mind sharing the changes you did?
You had the times for the ranges as a datarame. I stored them in a list so I literally just tweaked the for loop for a litst. Worked nicely!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.