2

I am using pandas to show dataframe, and my df looks like so:

  Day     Hour     Name     Msg
sunday     10        a       b
sunday     11        a       b
sunday     11        a       b
monday     12        a       b
tuesday    10        a       b
tuesday    10        a       b

now I want to summarize it to look like so:

sunday  3
monday  1
tuesday 2

and put this data in a dataframe so I will be able to plot it.

any idea how can I do it? thank you!

2 Answers 2

3

I think you need groupby with aggregate size

print (df.groupby('Day').size())
Day
monday     1
sunday     3
tuesday    2
dtype: int64

and then if need plot bar:

import matplotlib.pyplot as plt

df.groupby('Day').size().plot.bar()
plt.show()

graph

If order od days is important convert column Day to ordered categorical:

import matplotlib.pyplot as plt

cat = ['sunday','monday','tuesday']
df.Day = df.Day.astype('category', ordered=True, categories=cat)

df.groupby('Day').size().plot.bar()
plt.show()

graph1

If dont want use categorical, another solution is reindex by cat:

cat = ['sunday','monday','tuesday']
df.groupby('Day').size().reindex(cat).plot.bar()
plt.show()
Sign up to request clarification or add additional context in comments.

3 Comments

thank you though I didn't understand the "ordered categorical" what is it exactly? I couldn't understand from the url you gave.
sorry, it was bad link. need pandas.pydata.org/pandas-docs/stable/…
So after groupby you get first column called index from days and it is sorted alphanumeric. But if need custom sorting like by days of week, need create ordered categories - then it sorts by order of values in list cat
1

jezrael's answer is great, but there is a slightly easier way:

df.Day.value_counts()

Yields:

sunday     3
tuesday    2
monday     1
Name: Day, dtype: int64

They are ordered largest to smallest groups, which helps things not get lost. If you want them in a given order, reindex to the original order:

df.Day.value_counts().reindex(df.Day.unique())

Giving:

sunday     3
monday     1
tuesday    2
Name: Day, dtype: int64

You could also order them any way you like, giving a custom list to .reindex().

For plotting, you can then do:

df.Day.value_counts().plot.bar()

Or

df.Day.value_counts().plot.bar(figsize=(2,2))

For:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.