2

When converting a pandas.Multiindex to a numpy.ndarray, the output is a one dimensional ndarray with dtype=object as seen in the following example:

df = pd.DataFrame({
    'A': [10, 20, 30, 40, 50, 60],
    'B': [0,1,2,3,4,5],
    'C': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5']
}).set_index(['A','B'])

The df will be:

A B C
10 0 K0
20 1 K1
30 2 K2
40 3 K3
50 4 K4
60 5 K5

The output for df.index.to_numpy() is a one dimensional ndarray with dtype=object:

array([(10, 0), (20, 1), (30, 2), (40, 3), (50, 4), (60, 5)], dtype=object)

but I want:

array([[10,  0],
       [20,  1],
       [30,  2],
       [40,  3],
       [50,  4],
       [60,  5]])

On How to convert a Numpy 2D array with object dtype to a regular 2D array of floats, I found the following solution:

np.vstack(df.index)

Is there any more direct or better solution?

4
  • what's the problem with the current solution? Commented Mar 3, 2021 at 1:26
  • 2
    What do you mean by better? Isn't np.vstack(df.index) precisely the desired output? Commented Mar 3, 2021 at 1:27
  • Yeah, current solution seems fine, but I was wondering if there is any case that my solution won't work or if pandas can give me the correct output without the need to do np.vstack. Commented Mar 4, 2021 at 19:08
  • I was also thinking there can be a downside to my method, compared to, say, @delimiter's solution below (in terms of type conversion or what not), so I thought I can have some people doublecheck it. Commented Mar 4, 2021 at 19:16

2 Answers 2

2

I am pretty sure you will get what you want by flattening the multi index and taking numpy array from the result. E.g. by using the following syntax

np.array(list(df.index))
Sign up to request clarification or add additional context in comments.

2 Comments

This works, too. I was wondering if this is better( faster, more applicable to all situation, etc.) or the one I found.
That would be a point for you to measure the performance, there are ways of doing it, but likely it won't be very noticeable if your dataset is not sizeable enough. In the meantime, don't hesitate to accept the response to your liking.
2

turn the index to columns.

df.reset_index()[['A', 'B']].values

1 Comment

This can work, too. I'm still wondering which method is better/faster/more general. For example, is it possible that in one of the solutions given so far, the dtype of the cells are changed( e.g. from int to float or the other way around).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.