2

A data frame and I want to pick some it by the value in a column. In this case, rows of 'reports' between 10~31.

import pandas as pd

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Daisy', 'River', 'Kate', 'David', 'Jack', 'Nancy'], 
    'month of entry': ["20171002", "20171206", "20171208", "20171018", "20090506", "20171128", "20101216", "20171230", "20171115", "20171030", "20171216"],
    'reports': [14, 24, 31, 22, 34, 6, 47, 2, 14, 10, 8]}
df = pd.DataFrame(data)

df_4 = df[(df.reports >= 10) | (df.reports <= 31)]
df_5 = df.query('reports >= 10 | reports <= 31')

print df_4
print df_5

Above generated 2 sets of same wrong result (47 is there!):

   month of entry   name  reports
0        20171002  Jason       14
1        20171206  Molly       24
2        20171208   Tina       31
3        20171018   Jake       22
4        20090506    Amy       34
5        20171128  Daisy        6
6        20101216  River       47
7        20171230   Kate        2
8        20171115  David       14
9        20171030   Jack       10
10       20171216  Nancy        8

What went wrong? Thank you.

1
  • 1
    Replace the df_4 = df[(df.reports >= 10) | (df.reports <= 31)] to df_4 = df[(df.reports >= 10) & (df.reports <= 31)]. You want both to be true, thus use and, not or. Commented Mar 16, 2018 at 7:35

2 Answers 2

2

You need & for bitwise AND, but better is use between:

df1 = df[(df.reports >= 10) & (df.reports <= 31)]

Or:

df1 = df[df.reports.between(10,31)] 
print (df1)
  month of entry   name  reports
0       20171002  Jason       14
1       20171206  Molly       24
2       20171208   Tina       31
3       20171018   Jake       22
8       20171115  David       14
9       20171030   Jack       10

Detail:

print ((df.reports >= 10) & (df.reports <= 31))
0      True
1      True
2      True
3      True
4     False
5     False
6     False
7     False
8      True
9      True
10    False
Name: reports, dtype: bool
Sign up to request clarification or add additional context in comments.

Comments

2
import pandas as pd

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Daisy', 'River', 'Kate', 'David', 'Jack', 'Nancy'], 
    'month of entry': ["20171002", "20171206", "20171208", "20171018", "20090506", "20171128", "20101216", "20171230", "20171115", "20171030", "20171216"],
    'reports': [14, 24, 31, 22, 34, 6, 47, 2, 14, 10, 8]}
df = pd.DataFrame(data)
df_4 = df[(df.reports >= 10) & (df.reports <= 31)]   #Use '&' instead of '|'
print df_4

Output:

  month of entry   name  reports
0       20171002  Jason       14
1       20171206  Molly       24
2       20171208   Tina       31
3       20171018   Jake       22
8       20171115  David       14
9       20171030   Jack       10

2 Comments

thank you! would you mind I choose jezrael's for answer as he provided 2 methods?
Sure np :). I just like to practice code snippets :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.