3

I'm quite new to decorators and classes in general on Python, but have a question if there is a better way to decorate pandas objects. An an example, I have written the following to create two methods -- lisa and wil:

import numpy as np
import pandas as pd

test = np.array([['john', 'meg', 2.23, 6.49],
       ['lisa', 'wil', 9.67, 8.87],
       ['lisa', 'fay', 3.41, 5.04],
       ['lisa', 'wil', 0.58, 6.12],
       ['john', 'wil', 7.31, 1.74]],
)
test = pd.DataFrame(test)
test.columns = ['name1','name2','scoreA','scoreB']

@pd.api.extensions.register_dataframe_accessor('abc')
class ABCDataFrame:

    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    @property
    def lisa(self):
        return self._obj.loc[self._obj['name1'] == 'lisa']
    @property
    def wil(self):
        return self._obj.loc[self._obj['name2'] == 'wil']

Example output is as follows:

test.abc.lisa.abc.wil
  name1 name2 scoreA scoreB
1  lisa   wil   9.67   8.87
3  lisa   wil   0.58   6.12

I have two questions.

First, in practice, I am creating much more than two methods, and need to call many of them in the same line. Is there a way to get test.lisa.wil to return the same output as above where I wrote test.abc.lisa.abc.wil, since the former will save me from having to type the abc each time?

Second, if there are any other suggestions/resources on decorating pandas DataFrames, please let me know.

3 Answers 3

6
+50

You can do this with the pandas-flavor library, which allows you to extend the DataFrame class with additional methods.

import pandas as pd
import pandas_flavor as pf

# Create test DataFrame as before.
test = pd.DataFrame([
    ['john', 'meg', 2.23, 6.49],
    ['lisa', 'wil', 9.67, 8.87],
    ['lisa', 'fay', 3.41, 5.04],
    ['lisa', 'wil', 0.58, 6.12],
    ['john', 'wil', 7.31, 1.74]
], columns=['name1', 'name2', 'scoreA', 'scoreB'])

# Register new methods.
@pf.register_dataframe_method
def lisa(df):
    return df.loc[df['name1'] == 'lisa']

@pf.register_dataframe_method
def wil(df):
    return df.loc[df['name2'] == 'wil']

Now it is possible to treat these as methods, without the intermediate .abc accessor.

test.lisa()                                                                                                                                                                                                                         
#   name1 name2  scoreA  scoreB
# 1  lisa   wil    9.67    8.87
# 2  lisa   fay    3.41    5.04
# 3  lisa   wil    0.58    6.12

test.lisa().wil()                                                                                                                                                                                                                   
#   name1 name2  scoreA  scoreB
# 1  lisa   wil    9.67    8.87
# 3  lisa   wil    0.58    6.12

Update

Since you have many of these, it is also possible to define a generic filtering method and then call it in some loops.

def add_method(key, val, fn_name=None):  
    def fn(df):
        return df.loc[df[key] == val]

    if fn_name is None:
        fn_name = f'{key}_{val}'

    fn.__name__ = fn_name
    fn = pf.register_dataframe_method(fn)
    return fn

for name1 in ['john', 'lisa']:
    add_method('name1', name1)

for name2 in ['fay', 'meg', 'wil']:
    add_method('name2', name2)

And then these become available as methods just as if you had defined the methods directly. Note that I have prefixed with the column name (name1 or name2) to be extra clear. That is optional.

test.name1_john()                                                                                                                                                                                                             
#   name1 name2  scoreA  scoreB
# 0  john   meg    2.23    6.49
# 4  john   wil    7.31    1.74

test.name1_lisa()                                                                                                                                                                                                                   
#   name1 name2  scoreA  scoreB
# 1  lisa   wil    9.67    8.87
# 2  lisa   fay    3.41    5.04
# 3  lisa   wil    0.58    6.12

test.name2_fay()                                                                                                                                                                                                                    
#   name1 name2  scoreA  scoreB
# 2  lisa   fay    3.41    5.04

Update 2

It is also possible for registered methods to have arguments. So another approach is to create one such method per column, with the value as an argument.

@pf.register_dataframe_method
def name1(df, val):
    return df.loc[df['name1'] == val]

@pf.register_dataframe_method
def name2(df, val):
    return df.loc[df['name2'] == val]

test.name1('lisa')
#   name1 name2  scoreA  scoreB
# 1  lisa   wil    9.67    8.87
# 2  lisa   fay    3.41    5.04
# 3  lisa   wil    0.58    6.12

test.name1('lisa').name2('wil')
#   name1 name2  scoreA  scoreB
# 1  lisa   wil    9.67    8.87
# 3  lisa   wil    0.58    6.12
Sign up to request clarification or add additional context in comments.

Comments

0

If you want to get data with test.lisa.wil, I think using a wrapper class is more appropiate then decorator. Also I personally prefer something like test.access(name1='lisa', name2='wil') to access the data.

Here is an example on how to accomplish it:

import numpy as np
import pandas as pd

test = np.array([['john', 'meg', 2.23, 6.49],
       ['lisa', 'wil', 9.67, 8.87],
       ['lisa', 'fay', 3.41, 5.04],
       ['lisa', 'wil', 0.58, 6.12],
       ['john', 'wil', 7.31, 1.74]],
)
test = pd.DataFrame(test)
test.columns = ['name1','name2','scoreA','scoreB']

class WrapDataFrame(pd.DataFrame):
    def access(self, **kwargs):
        result = self
        for key, val in kwargs.items():
            result = result.loc[result[key] == val]
        return WrapDataFrame(result)
    @property
    def lisa(self):
        return WrapDataFrame(self.loc[self['name1'] == 'lisa'])
    @property
    def wil(self):
        return WrapDataFrame(self.loc[self['name2'] == 'wil'])

wdf = WrapDataFrame(test)

# First way to access
print(wdf.lisa.wil)

# Second way to access (recommended)
print(wdf.access(name1='lisa', name2='wil'))

# Third way to access (easiest to do programaticaly)
data_filter = {'name1': 'lisa', 'name2': 'wil'}
print(wdf.access(**data_filter))

Notice that the class WrapDataFrame inherit pd.DataFrame, so all the operation for pandas dataframe should be compatible.

Comments

0

You can use class to help you. (although this doesn't have much to do with the real decoration function).

see the following:

class DecoratorDF:
    def __init__(self, df: pd.DataFrame, n_layer: int = 0):
        self.df = df
        self.layer = n_layer

    def __repr__(self):
        return str(self.df)

    def __getattr__(self, item):
        layer = self.df.columns[self.layer]
        return DecoratorDF(self.df.loc[self.df[layer] == item], self.layer + 1)


my_df = DecoratorDF(
    pd.DataFrame([['A', 'B', 'C'],
                  ['A', 'B', 'D'],
                  ['E', 'F', 'G'],
                  ], columns=['name1', 'name2', 'name3'])
)

print(my_df.A.B)
print(my_df.A.B.C)
  name1 name2 name3
0     A     B     C
1     A     B     D

  name1 name2 name3
0     A     B     C

Full Example

import numpy as np
import pandas as pd


class DecoratorDF:
    def __init__(self, df: pd.DataFrame, n_layer: int = 0):
        self.df = df
        self.layer = n_layer

    def __repr__(self):
        return str(self.df)

    def __getattr__(self, item):
        layer = self.df.columns[self.layer]
        return DecoratorDF(self.df.loc[self.df[layer] == item], self.layer + 1)


test_data = np.array([['john', 'meg', 2.23, 6.49],
                      ['lisa', 'wil', 9.67, 8.87],
                      ['lisa', 'fay', 3.41, 5.04],
                      ['lisa', 'wil', 0.58, 6.12],
                      ['john', 'wil', 7.31, 1.74]],
                     )
test_df = pd.DataFrame(test_data, columns=['name1', 'name2', 'scoreA', 'scoreB'])
test_df = DecoratorDF(test_df)
df_lisa_and_wil = test_df.lisa.wil
print(df_lisa_and_wil)

df_lisa_and_wil = df_lisa_and_wil.df
print(df_lisa_and_wil.loc[df_lisa_and_wil['scoreA'] == '9.67'])

  name1 name2 scoreA scoreB
1  lisa   wil   9.67   8.87
3  lisa   wil   0.58   6.12

  name1 name2 scoreA scoreB
1  lisa   wil   9.67   8.87

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.