4

How would I go about extracting rows where a column matches a specific value from a Dataframe created from an excel file?

Here are few rows from the Dataframe:

    Food            Men     Women
0   Total fruit     86.20   88.26
1   Apples, Total   89.01   89.66
2   Apples as fruit 89.18   90.42
3   Apple juice     88.78   88.42
4   Bananas         95.42   94.18
5   Berries         84.21   81.73
6   Grapes          88.79   88.13

and this is the code that I used to read the excel file, choosing the columns I needed and renaming them appropriately:

data1= pd.read_excel('USFoodCommodity.xls', sheetname='94-98 FAH', skiprows=76,skip_footer=142, parse_cols='A, H, K')
data1.columns = ['Food', 'Men', 'Women']

# Try 1: data1 = data1[data1['Food'].isin(['Total fruit']) == True] works
# Try 2: data1 = data1[data1['Food'].isin(['Apple, Total']) == True] doesn't work
# Try 3: data1 = data1.iloc[[1]] returns Apples, Total but not appropriate to use integer index
# Try 4: data1[data1['Food'] == 'Berries'] doesn't work

So far based on the answers such as this, this, or here I have only been able to return the first index where Food = "Total fruit". When I try the other methods above I only get the column names such as:

Food    Men Women

I am new to pandas and can't see where I am going wrong. Why can I extract the first row where Food == Total fruit but not anything else?

3 Answers 3

2

For me it works nice, maybe problem with some whitespaces - remove them by strip:

print (data1.Food.tolist())
['Total fruit', 'Apples, Total ', 'Apples as fruit', 
'Apple juice', 'Bananas', ' Berries', 'Grapes']

data1['Food'] = data1['Food'].str.strip()

print (data1.Food.tolist())
['Total fruit', 'Apples, Total', 'Apples as fruit', 
'Apple juice', 'Bananas', 'Berries', 'Grapes']

data2 = data1[data1['Food'].isin(['Total fruit'])]
print (data2)
          Food   Men  Women
0  Total fruit  86.2  88.26

data3 = data1[data1['Food'].isin(['Apples, Total'])]
print (data3)
            Food    Men  Women
1  Apples, Total  89.01  89.66

data3 = data1[data1['Food'].isin(['Berries'])]
print (data3)
      Food    Men  Women
5  Berries  84.21  81.73
Sign up to request clarification or add additional context in comments.

Comments

0

Use this code

data1= pd.read_excel('USFoodCommodity.xls', sheetname='94-98 FAH', skiprows=76,skip_footer=142, parse_cols='A, H, K')
list_of_strings_to_match = ['Total fruit', 'Berries', 'Grape']
for index, row in data1.iterrows():
   if row['Food'] in list_of_strings_to_match:
      print row

1 Comment

No row results for Berries or Grapes
0

This question might be old, but here is an easier and intuitive way.

Note: This solution only works on pandas >= 0.13.

You can now select columns from a dataframe using the .query() method.

It's as simple as:

df.query('column == value') # The comparison operator can be anything.

For example, in your case, you could query like so:

data1.query('Food == "Total Fruit"')

or

data1.query('Food == Berries')

To access a variable, use @.

fruit = "berries"
data1.query('Food == @fruit')

You can even club multiple conditions using &.

data1.query('condition1 == value1 & condition2 == value2')

Hope it helped.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.