4

I have a DataFrame from which I want to normalize some arbitrary columns using another arbitrary column:

import itertools as it
import numpy as np
import pandas as pd

header = tuple(['h_seqNum', 'h_stamp', 'user_id'])
joints = tuple(['head', 'neck', 'torso'])
attribs = tuple(['pos_x','pos_y','pos_z'])

all_columns = it.izip(*it.product(joints, attribs))
multiind_first = list(it.chain(['header']*len(header), all_columns.next(), ['pose',]))
multiind_second = list(it.chain(header, all_columns.next(), ['pose',]))

df = pd.DataFrame(np.random.rand(65).reshape(5,13),  columns = pd.MultiIndex.from_arrays([multiind_first, multiind_second], names=['joint', 'attrib']))

The resulting DataFrame is something like this one:

joint    header                            head                       neck                       torso                      pose
attrib   h_seqNum    h_stamp    user_id    pos_x    pos_y    pos_z    pos_x    pos_y    pos_z    pos_x    pos_y    pos_z    pose
0        0.681       0.059      0.607      0.093    0.504    0.975    0.317    0.739    0.129    0.759    0.254    0.814    1
1        0.914       0.420      0.305      0.242    0.700    0.180    0.324    0.171    0.477    0.943    0.877    0.069    0
2        0.522       0.395      0.118      0.739    0.653    0.326    0.947    0.517    0.036    0.647    0.079    0.227    0
3        0.475       0.815      0.792      0.208    0.472    0.427    0.213    0.544    0.440    0.033    0.636    0.527    2
4        0.767       0.774      0.983      0.646    0.949    0.947    0.402    0.015    0.913    0.734    0.192    0.032    0    

I want to normalize all the columns (attrib) belonging to an arbitrary joint (eg. 'head') using another arbitrary joint (eg. 'torso'). For instance something like.

df['head'] = df['head'] - df['torso']
df['neck'] = df['neck'] - df['torso']
# Note that torso remains "unnormalized"

To do so I wrote a function:

def normalize_joints(df, from_joint):
    joint_names = set(joints) - set([from_joint,])
    for j in list(joint_names):
         df[j] = df[j] - df[norm_name]

However, when I execute this function I get the following error:

normalize_joints(df, 'torso')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-414-47f39f04716d> in <module>()
----> 1 normalize_joints(df, 'torso')

<ipython-input-407-cf13a67fabd8> in normalize_joints(df, from_joint)
      2     joint_names = set(joints) - set([from_joint,])
      3     for j in list(joint_names):
----> 4         df[j] = df[j] - df[from_joint]

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2117                                          fill_value, limit, takeable=takeable)
   2118 
-> 2119         return frame
   2120 
   2121     def _reindex_index(self, new_index, method, copy, level, fill_value=NA,

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2164     @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)
   2165     def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True,
-> 2166                      limit=None, fill_value=np.nan):
   2167         return super(DataFrame, self).reindex_axis(labels=labels, axis=axis,
   2168                                                    method=method, level=level,

/Library/Python/2.7/site-packages/pandas/core/generic.pyc in _set_item(self, key, value)
    677 
    678     __bool__ = __nonzero__
--> 679 
    680     def bool(self):
    681         """ Return the bool of a single element PandasObject

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in set(self, item, value)
   1768     def sp_index(self):
   1769         return self.values.sp_index
-> 1770 
   1771     @property
   1772     def kind(self):

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _reset_ref_locs(self)
   1054         # see if we can align other
   1055         if hasattr(other, 'reindex_axis'):
-> 1056             if align:
   1057                 axis = getattr(other, '_info_axis_number', 0)
   1058                 other = other.reindex_axis(self.items, axis=axis,

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _rebuild_ref_locs(self)
   1062 
   1063         # make sure that we can broadcast
-> 1064         is_transposed = False
   1065         if hasattr(other, 'ndim') and hasattr(values, 'ndim'):
   1066             if values.ndim != other.ndim or values.shape == other.shape[::-1]:

AttributeError: _ref_locs

After several tries I have not been able to locate the source of my error. If I perform the operation

df['head'] - df['torso']

it returns me a DataFrame with the correct result. However, when I try to assign this DataFrame to df['head'] I get the error shown before.

Is it any way to perform this assignment?

Moreover, I was wondering if there are any better ways to perform the same normalization than the one I am trying. Perhaps using groupby and then and applying the normalize function to the selected DataFrame?

EDIT:

This error occurred with numpy 1.6 and pandas 0.12

After upgrading to numpy 1.8 and pandas 0.13 the following operation is valid:

df['head'] = df['head'] - df['torso']
2
  • In your first code block, you need to replace multiind_first with mi_level_one and multiind_second with mi_level_two. Commented Feb 17, 2014 at 15:21
  • Replaced. Just an issue of copy-pasting my code. Thanks! Commented Feb 17, 2014 at 15:23

2 Answers 2

2

The problem is that your columns are instances of MultiIndex try this:

def normalize_joints(df, from_joint):
    joint_names = set(joints) - set([from_joint,])
    for j in list(joint_names):
        keys = [(j,c) for c in attribs]
        df[keys] = df[j] - df[from_joint]

print df
normalize_joints(df, 'torso')
print df

Output:

joint     header                          head                          neck                         torso                          pose
attrib  h_seqNum   h_stamp   user_id     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z      pose
0       0.067366  0.957394  0.983969  0.602662  0.505270  0.990675  0.753841  0.598397  0.846479  0.757155  0.220009  0.328470  0.686525
1       0.806405  0.800388  0.302178  0.935559  0.180360  0.322767  0.230457  0.617555  0.602589  0.109482  0.181803  0.311266  0.929481
2       0.649677  0.237286  0.963088  0.370463  0.471590  0.489256  0.060383  0.070885  0.858312  0.306232  0.511731  0.257015  0.283287
3       0.054800  0.127925  0.099985  0.700160  0.211256  0.026782  0.820380  0.922593  0.600130  0.100745  0.418157  0.869735  0.597275
4       0.678372  0.334520  0.247894  0.616133  0.914610  0.229628  0.317488  0.224910  0.620222  0.952499  0.946568  0.539502  0.838473
joint     header                          head                          neck                         torso                          pose
attrib  h_seqNum   h_stamp   user_id     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z      pose
0       0.067366  0.957394  0.983969 -0.154493  0.285261  0.662205 -0.003314  0.378387  0.518009  0.757155  0.220009  0.328470  0.686525
1       0.806405  0.800388  0.302178  0.826077 -0.001443  0.011501  0.120975  0.435752  0.291322  0.109482  0.181803  0.311266  0.929481
2       0.649677  0.237286  0.963088  0.064231 -0.040141  0.232241 -0.245850 -0.440846  0.601297  0.306232  0.511731  0.257015  0.283287
3       0.054800  0.127925  0.099985  0.599414 -0.206900 -0.842953  0.719635  0.504436 -0.269605  0.100745  0.418157  0.869735  0.597275
4       0.678372  0.334520  0.247894 -0.336366 -0.031958 -0.309874 -0.635011 -0.721658  0.080719  0.952499  0.946568  0.539502  0.838473
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, @xndrme Your answer has raised me another question. Why if df['head'] - df['torso'] produces a pd.DataFrame with the same results as your answer it is not possible to assign it to df['head']? I understand that it must be something related with the MultiIndex, but I do not see why
The problem is that df['head'] on a multi-index is just partial, it works for getting the data but it seems that for setting you should provide the entire multilevel indexes (I think it has something to do with the implementation of pandas, maybe some of its developer could answer your question better ;)
Somehow it seems that the developers had this issue in mind. Upgrading to numpy 1.8 and pandas 0.13 fixed the problem.
@VGonPa, well, I think I need an upgrade too :)
2

I believe that I have found a rather simple solution:

def normalize(df, from_joint):
    df.drop(['header', 'pose', from_joint], axis=1, level='joint').sub(df[from_joint], level=1)

df.update(normalize(df, 'torso'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.