0

I have a list of variables typed as numpy. I want to use seaborn to box plot them in one figure.

subscribers=bankData.loc[bankData['deposit']==1] # Only who subscribe in term deposition 

occupations=bankData['job'].unique().tolist()

admin=subscribers['age'].loc[subscribers['job']=='admin.'].values
technician=subscribers['age'].loc[subscribers['job']=='technician'].values
services=subscribers['age'].loc[subscribers['job']=='services'].values
management=subscribers['age'].loc[subscribers['job']=='management'].values
retired=subscribers['age'].loc[subscribers['job']=='retired'].values
blue_collar=subscribers['age'].loc[subscribers['job']=='blue-collar'].values
unemployed=subscribers['age'].loc[subscribers['job']=='unemployed'].values
enterpreneur=subscribers['age'].loc[subscribers['job']=='enterpreneur'].values
housemaid=subscribers['age'].loc[subscribers['job']=='housemaid'].values
unknown= subscribers['age'].loc[subscribers['job']=='unknown'].values
self_employed=subscribers['age'].loc[subscribers['job']=='self-employed'].values
student=subscribers['age'].loc[subscribers['job']=='student'].values

occpuation_age=[admin, technician,services, management, retired, blue_collar, unemployed, enterpreneur, housemaid,
                unknown, self_employed, student]

I want every boxplot shows one item in occpuation_age.

1
  • See this and this and this for example Commented Jun 9, 2019 at 2:42

1 Answer 1

1

No need to split data frame into separate numpy arrays, simply pass variable names in seaborn plot:

sns.boxplot(x='job', y='age', data=subscribers)

To demonstrate on random, seeded data:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(682019)
occupations = ['admin', 'technician', 'management', 'retired', 'blue_collar',
               'unemployed', 'enterpreneur', 'housemaid',
               'unknown', 'self_employed', 'student']
subscribers = pd.DataFrame({'job': np.random.choice(occupations, 100),
                            'age': np.random.uniform(0, 100, 100)})

print(subscribers.head(10))
#              job        age
# 0     technician   2.188924
# 1    blue_collar  40.868834
# 2     management  44.179859
# 3     technician  72.193644
# 4   enterpreneur  83.680639
# 5   enterpreneur  60.923324
# 6        student  99.163055
# 7     management  80.392648
# 8        unknown  96.985044
# 9  self_employed  92.147679

fig, ax = plt.subplots(figsize=(14,5))
sns.boxplot(y='age', x='job', data=subscribers, ax=ax)

plt.show()
plt.clf()
plt.close()

BoxPlot Output

To sort descending by median age, add the needed aggregate column with groupby().transform() and then sort with this column:

subscribers['job_mean'] = subscribers.groupby('job')['age'].transform('median')
subscribers = subscribers.sort_values('job_mean', ascending=False)

fig, ax = plt.subplots(figsize=(14,5))
sns.boxplot(y='age', x='job', data=subscribers, ax=ax)

plt.show()
plt.clf()
plt.close()

Sorted Box Plot Output

Sign up to request clarification or add additional context in comments.

2 Comments

how can i make them order descending in plot
If by median, see edit adding a new aggregate column used for sorting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.