7

I understand that pandas dataframe type has an ability to test the logic of it's value.

here's the code:

import pandas as pd
data = pd.DataFrame(columns=['a', 'b', 'c'])
data = data.append({'a': 'I have data', 'b': 'no more complexe', 'c': 024204}, ignore_index=True)
data = data.append({'a': 'audoausd', 'b': '2048rafaf', 'c': 29313}, ignore_index=True)
data = data.append({'a': 'koplak ente gan', 'b': 'ente g bisa koplak', 'c': 29313}, ignore_index=True)

now we have the following dataframe:

                 a                   b      c
0      I have data    no more complexe  10372
1         audoausd           2048rafaf  29313
2  koplak ente gan  ente g bisa koplak  29313

test the logic value for column c and save it to a variable

c = data.c > 20000

will set c to the following value

0    False
1     True
2     True
Name: c, dtype: bool

test the logic value for column b and save it to a variable

b = data.b.str.contains('koplak')

b value

0    False
1    False
2     True
Name: b, dtype: bool

and also for column a

a = data.a.str.contains('koplak')

a value

0    False
1    False
2     True
Name: b, dtype: bool

when i compare all of this values by doing a & b & c will return:

0    False
1    False
2     True
dtype: bool

it's not well fashioned to hard code in case there are many columns involve, so i try to make a list containing all columns logic

logic = [a, b, c]

how do i compare all the items automatically to get a & b & c result?

2 Answers 2

12

a & b & c is equivalent to

import functools
print(functools.reduce(lambda x,y: x & y, [a, b, c]))

which yields

0    False
1    False
2     True
dtype: bool

Unlike my original answer below (suggesting np.logical_and.reduce), I am confident functools.reduce(lambda x,y: x & y, [a, b, c]) will faithfully return the same Series as a & b & c.

(In Python2.7, reduce is a builtin function. functools.reduce is the same function as reduce. In Python3, reduce was removed from the builtins and only functools.reduce remains. So to future-proof your code, use functools.reduce.)


Edit: Using np.logical_and.reduce([logic]) may not work in all situations. Here is a counterexample:

import pandas as pd
import numpy as np
x = pd.Series([True,True,False,False], index=[1,2,3,4]) 
y = pd.Series([True,True,False,False], index=[1,2,3,4]) 
print(x & y)

prints

1     True
2     True
3    False
4    False
dtype: bool

but np.logical_and.reduce([x,y]) raises a ValueError

    print(np.logical_and.reduce([x,y]))
  File "/data1/unutbu/.virtualenvs/dev/local/lib/python2.7/site-packages/pandas-0.13.0_98_gd9b0c1f-py2.7-linux-i686.egg/pandas/core/generic.py", line 665, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Sign up to request clarification or add additional context in comments.

2 Comments

this is pretty useful; can u do a PR to add to the cookbook ? you can use this link with a nice title/description
I had the same problem but with a logical OR (|) and I came up with sum(my_list_of_serieses).astype(bool).
0

I would use np.all()

import pandas as pd
import numpy as np

data = pd.DataFrame(columns=['a', 'b', 'c'])
data = data.append({'a': 'I have data', 'b': 'no more complexe', 'c': 024204}, ignore_index=True)
data = data.append({'a': 'audoausd', 'b': '2048rafaf', 'c': 29313}, ignore_index=True)
data = data.append({'a': 'koplak ente gan', 'b': 'ente g bisa koplak', 'c': 29313}, ignore_index=True)

a = data.a.str.contains('koplak')
b = data.b.str.contains('koplak')
c = data.c > 20000

logic = [a, b, c]

result = np.all(logic, axis=0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.