1

I have some trouble populating a pandas DataFrame. I am following the instructions found here to produce a MultiIndex DataFrame. The example work fine except that I want to have an array instead of a single value.

activity = 'Open_Truck'
id = 1
index = pd.MultiIndex.from_tuples([(activity, id)], names=['activity', 'id'])
v = pd.Series(np.random.randn(1, 5), index=index)

Exception: Data must be 1-dimensional

If I replace randn(1, 5) with randn(1) it works fine. For randn(1, 1) I should use randn(1, 1).flatten('F') but also works. When trying:

v = pd.Series(np.random.randn(1, 5).flatten('F'), index=index)

ValueError: Wrong number of items passed 5, placement implies 1

My intention is to add 1 feature vector (they are np.array of course in real case scenario and not np.random.randn) for each activity and id in each row.
So, How do I manage to add an array in a MultiIndex DataFrame?

Edit:
As I am new to pandas I mixed Series with DataFrame. I can achieve the above using DataFrame which is two-dimensional by default:

arrays = [np.array(['Open_Truck']*2),
            np.array(['1', '2'])]
df = pd.DataFrame(np.random.randn(2, 4), index=arrays)
df
               0         1         2         3
Open 1 -0.210923  0.184874 -0.060210  0.301924
     2  0.773249  0.175522 -0.408625 -0.331581
1
  • I see your edit, there is same principe, need index with same length as MultiIndex. Commented May 14, 2018 at 9:18

1 Answer 1

1

There is problem MultiIndex has only one tuple and data length is different, 5 so lengths not match:

activity = 'Open_Truck'
id = 1
#get 5 times tuples
index = pd.MultiIndex.from_tuples([(activity, id)] * 5, names=['activity', 'id'])
print (index)
MultiIndex(levels=[['Open_Truck'], [1]],
           labels=[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]],
           names=['activity', 'id'])

print (len(index))
5

v = pd.Series(np.random.randn(1, 5).flatten('F'), index=index)
print (v)
activity    id
Open_Truck  1    -1.348832
            1    -0.706780
            1     0.242352
            1     0.224271
            1     1.112608
dtype: float64

In first aproach lengths are same, 1, because one tuple in list:

activity = 'Open_Truck'
id = 1
index = pd.MultiIndex.from_tuples([(activity, id)], names=['activity', 'id'])

print (len(index))
1

v = pd.Series(np.random.randn(1), index=index)
print (v)
activity    id
Open_Truck  1    -1.275131
dtype: float64
Sign up to request clarification or add additional context in comments.

1 Comment

Yeah, but in this way I get an array column-wise. Is there any way to get it row-wise? When I append new arrays they are added below the previous one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.