Calculate all missing values for specific data using pivot tables in pandas

Question

I am working on this dataset called titanic.csv Let's simplify the problem and include some data here:

I need to calculate all missing values for child, as you see it is a value under who column. This should be done using a pivot table.

I have tried this solution:

pd.pivot_table(df[df['who'] == 'child'], 
index='sex', 
aggfunc=lambda x: x.isnull().sum(), 
 margins=True) # to sum all missing values based on gender

But I get this output: in which as you also notice, ALL row doesn't sum all missing values per gender.

Where is the problem in my code? Should I use another way to create the pivot table?

Here you find a smaller dataset to reproduce your code: ` data = { 'survived': [0, 1, 1, 1, 0], 'pclass': [3, 1, 3, 1, 3], 'sex': ['male', 'female', 'female', 'female', 'male'], 'age': [22, 38, 6, 35, 35], 'class': ['Third', 'First', 'Third', 'First', 'Third'], 'who': ['man', 'woman', 'child', 'child', 'man'], 'deck': [None, 'C', None, 'C', None], 'alive': ['no', 'yes', 'yes', 'yes', 'no'], 'alone': [False, False, True, False, True] } ` — Anisa B.
– Anisa B., Commented Dec 14, 2024 at 22:19
Please show the output you expect using your smaller sample dataset. Note that this and the sample data are best edited into the question rather than given in comments, — user19077881
– user19077881, Commented Dec 14, 2024 at 22:53
@user19077881 the output I need is no different from the output provided. But as you notice in the last column ALL should sum 33+37 missing value and report 70, not 0. This is why I included margins. Why doesn’t perform so? What should I change in my code? — Anisa B.
– Anisa B., Commented Dec 15, 2024 at 11:53

rehaqds · Accepted Answer · 2024-12-15 12:06:23Z

1

EDIT:

If you prefer to use a pivot table, just add the parameter dropna=False to get the result you want.

First answer:

If you want the number of missing values per features for only the child you can use isna/isnull directly after filtering:

data = {'survived': [0, 1, 1, 1, 0], 
        'pclass': [3, 1, None, 1, 3], 
        'sex': ['male', 'female', 'female', 'female', 'male'], 
        'age': [22, 38, None, None, 35], 
        'class_': ['Third', 'First', None, 'First', 'Third'], 
        'who': ['man', 'woman', 'child', 'child', 'man'], 
        'deck': [None, 'C', None, 'C', None], 
        'alive': ['no', 'yes', 'yes', 'yes', 'no'], 
        'alone': [False, False, True, False, True] } 
df = pd.DataFrame(data)

display(df[df["who"] == "child"].isna().sum())

survived    0
pclass      1
sex         0
age         2
class_      1
who         0
deck        1
alive       0
alone       0

edited Dec 15, 2024 at 12:06

answered Dec 14, 2024 at 23:01

rehaqds

2,2452 gold badges6 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Anisa B. Dec 15, 2024 at 11:54

Thank you for your answer but it doesn’t answer my question. I need necessarily to create a pivot table, as also written in the question title.

rehaqds Dec 15, 2024 at 12:08

Ok, I edited my answer.

Anisa B. Dec 15, 2024 at 12:13

Well now it responds my question

Anisa B. · Accepted Answer · 2024-12-15 12:32:47Z

0

This question would be solved by also changing index to 'who' and removing margins parameter as not needed.

pd.pivot_table(df[df['who'] == 'child'], 
    index = 'who', 
    aggfunc = lambda x: x.isnull().sum())

answered Dec 15, 2024 at 12:32

Anisa B.

1584 silver badges13 bronze badges

Collectives™ on Stack Overflow

Calculate all missing values for specific data using pivot tables in pandas

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related