0

I have a dataframe that looks like this:

                timestamp       Value       Color
--------------------------------------------------
 0    2018-03-04 07:11:08          34         Red
 1    2018-03-04 07:11:09          34         Red
 2    2018-03-04 07:11:10          35         Red
 3    2018-03-04 07:11:12          36         Red
 4    2018-03-04 07:11:14          24         Red
 5    2018-03-04 07:11:15          34         Red
... 
55    2018-03-04 07:12:17          34        Blue
56    2018-03-04 07:12:18          35        Blue
57    2018-03-04 07:12:19          36        Blue
58    2018-03-04 07:12:20          37        Blue
59    2018-03-04 07:12:21          35        Blue
60    2018-03-04 07:12:22          32        Blue

And so over the course of 60 seconds, for each time stamp, there is a value recorded, but the values are split between two colors, Red and Blue. And so, within this dataframe we see time series curves for two different curves occurring at different times, one after the other, and not overlapping. What I want to do is plot them. However, I want to ignore the timestamps, so that it is assumed they start at the same time, and so just treating each color as an array of ordered values, ignoring time skips and assuming equally spaced time intervals. I simply want to plot the Red curve and the Blue curve on the same chart. How can this be done in python? I am trying simply

plt.plot(Blue, Red)

Though I am not sure how to account for the x-axis, which I simply want to be seconds.

1 Answer 1

1
df = pandas.DataFrame({
            'times':list(pandas.date_range('2020-01-01',periods=10,freq='15T')) + 
                    list(pandas.date_range('2020-01-01',periods=10,freq='15T')),
            'colors':['red']*10 + ['blue'] * 10,
            'value': numpy.random.randint(0,255,20)
    })

gives us something like your dataframe

                 times colors  value
0  2020-01-01 00:00:00    red    224
1  2020-01-01 00:15:00    red     47
2  2020-01-01 00:30:00    red     25
3  2020-01-01 00:45:00    red    211
4  2020-01-01 01:00:00    red     18
5  2020-01-01 01:15:00    red    119
6  2020-01-01 01:30:00    red     52
7  2020-01-01 01:45:00    red    246
8  2020-01-01 02:00:00    red     54
9  2020-01-01 02:15:00    red    156
10 2020-01-01 00:00:00   blue     42
11 2020-01-01 00:15:00   blue     55
12 2020-01-01 00:30:00   blue    151
13 2020-01-01 00:45:00   blue    236
14 2020-01-01 01:00:00   blue    207
15 2020-01-01 01:15:00   blue    165
16 2020-01-01 01:30:00   blue    131
17 2020-01-01 01:45:00   blue    199
18 2020-01-01 02:00:00   blue    247
19 2020-01-01 02:15:00   blue     61

we can pivot this using

 df2 = df.pivot(index='times',columns=['colors'],values=['value'])

which gives us

                        value     
colors               blue  red
times                         
2020-01-01 00:00:00    70  225
2020-01-01 00:15:00   162   78
2020-01-01 00:30:00   188   37
2020-01-01 00:45:00   134  234
2020-01-01 01:00:00    46   73
2020-01-01 01:15:00    76   60
2020-01-01 01:30:00   143   61
2020-01-01 01:45:00   150  198
2020-01-01 02:00:00    82  159
2020-01-01 02:15:00   127   94

now we can easily just plot it...

df2.plot()
pyplot.show()

you can drop the value part of the column name with

df2 = df2.droplevel(0,axis=1)
df2.plot()
pyplot.show()

enter image description here

The other option is to just call it individually

BLUE = df[df['colors'] == 'blue']
RED = df[df['colors'] == 'red']
pyplot.plot(BLUE['times'],BLUE['value'])
pyplot.plot(RED['times'],RED['value'])
pyplot.show()

you could use pandas groupby also (dont do this one probably :P )

def plot_it(group,values):
    pyplot.plot(values['times'],values['value'])
df.groupby(['colors']).apply(plot_it)
pyplot.show()

but really the "right" way to handle it is probably the first option (to pivot it to the shape you want)

---- Edit (based on comments) ----

if you dont want the months and to just treat it as a list of y values, just use range as your x

BLUE = df[df['colors'] == 'blue']
RED = df[df['colors'] == 'red']
pyplot.plot(range(len(BLUE)),BLUE['value'])
pyplot.plot(range(len(RED)),RED['value'])
pyplot.show()
Sign up to request clarification or add additional context in comments.

7 Comments

This is very helpful, thank you. I tried your first option, and I got the plot, though the curves are not beginning at the same timestamp. This is how the data actually is, since the Blue curve starts after the Red curve ends, but I had wanted to remove the timestamps, so that while the values are still in order, it is as if Red and Blue start at the 00:00:00 timestamp, so just plotting them as arrays. Sorry if my post was not clear on that. I am trying to have the red curve and blue curve start at the same point on the x-axis, regardless of whether one curve goes longer, which it will.
dont do the index='times' in the pivot call and i think it will do what you want
I tried that, and it is actually still recognizing that the curves start at different points, so getting the same plot as before where Blue starts after Red. This figures since when I remove index='times' and remove the timestamp column completely, and then pivot, I see all NaN value for the rows in the Blue column, where the Red column has values, showing me that it is recognizing that Red gets values before Blue does, when I want them to get there values starting at the very first row. Thanks for the suggestion though!
ahh yeah i guess that doesnt quite work ... :/ sorry im sure you can adjust this to make it work... but for that it might be easier to use one of the other solutions
I tried the other options, and still the Blue curve is shown after the Red curve, rather than both starting at 0 on this x-axis. Hmmm, this might be trickier than I had anticipated.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.