0

I am working on this dataset called titanic.csv Let's simplify the problem and include some data here:

[![enter image description here](https://i.sstatic.net/LKGk47dr.png)](https://i.sstatic.net/LKGk47dr.png)

I need to calculate all missing values for child, as you see it is a value under who column. This should be done using a pivot table.

I have tried this solution:

pd.pivot_table(df[df['who'] == 'child'], 
index='sex', 
aggfunc=lambda x: x.isnull().sum(), 
 margins=True) # to sum all missing values based on gender

But I get this output: output when using the above code in which as you also notice, ALL row doesn't sum all missing values per gender.

Where is the problem in my code? Should I use another way to create the pivot table?

3
  • Here you find a smaller dataset to reproduce your code: ` data = { 'survived': [0, 1, 1, 1, 0], 'pclass': [3, 1, 3, 1, 3], 'sex': ['male', 'female', 'female', 'female', 'male'], 'age': [22, 38, 6, 35, 35], 'class': ['Third', 'First', 'Third', 'First', 'Third'], 'who': ['man', 'woman', 'child', 'child', 'man'], 'deck': [None, 'C', None, 'C', None], 'alive': ['no', 'yes', 'yes', 'yes', 'no'], 'alone': [False, False, True, False, True] } ` Commented Dec 14, 2024 at 22:19
  • 1
    Please show the output you expect using your smaller sample dataset. Note that this and the sample data are best edited into the question rather than given in comments, Commented Dec 14, 2024 at 22:53
  • @user19077881 the output I need is no different from the output provided. But as you notice in the last column ALL should sum 33+37 missing value and report 70, not 0. This is why I included margins. Why doesn’t perform so? What should I change in my code? Commented Dec 15, 2024 at 11:53

2 Answers 2

1

EDIT:

If you prefer to use a pivot table, just add the parameter dropna=False to get the result you want.


First answer:

If you want the number of missing values per features for only the child you can use isna/isnull directly after filtering:

data = {'survived': [0, 1, 1, 1, 0], 
        'pclass': [3, 1, None, 1, 3], 
        'sex': ['male', 'female', 'female', 'female', 'male'], 
        'age': [22, 38, None, None, 35], 
        'class_': ['Third', 'First', None, 'First', 'Third'], 
        'who': ['man', 'woman', 'child', 'child', 'man'], 
        'deck': [None, 'C', None, 'C', None], 
        'alive': ['no', 'yes', 'yes', 'yes', 'no'], 
        'alone': [False, False, True, False, True] } 
df = pd.DataFrame(data)

display(df[df["who"] == "child"].isna().sum())

survived    0
pclass      1
sex         0
age         2
class_      1
who         0
deck        1
alive       0
alone       0
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your answer but it doesn’t answer my question. I need necessarily to create a pivot table, as also written in the question title.
Ok, I edited my answer.
Well now it responds my question
0

This question would be solved by also changing index to 'who' and removing margins parameter as not needed.

pd.pivot_table(df[df['who'] == 'child'], 
    index = 'who', 
    aggfunc = lambda x: x.isnull().sum())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.