4

I have a dataframe and want to convert it into a numpy array to plot its values. The dataframe looks like this:

>>> df_ohlc
                        open       high        low      close
Date                                                           
2018-03-07 03:35:00  62.189999  62.189999  62.169998  62.180000
2018-03-07 03:36:00  62.180000  62.180000  62.160000  62.180000
2018-03-07 03:37:00  62.169998  62.220001  62.169998  62.209999
2018-03-07 03:38:00  62.220001  62.220001  62.189999  62.200001
...
[480 rows x 4 columns]

>>> df_ohlc.index
DatetimeIndex(['2018-03-07 03:35:00', '2018-03-07 03:36:00',
            '2018-03-07 03:37:00', '2018-03-07 03:38:00',
            '2018-03-07 03:39:00', '2018-03-07 03:40:00',
            '2018-03-07 03:41:00', '2018-03-07 03:42:00',
            '2018-03-07 03:43:00', '2018-03-07 03:44:00',
            ...
            '2018-03-07 11:25:00', '2018-03-07 11:26:00',
            '2018-03-07 11:27:00', '2018-03-07 11:28:00',
            '2018-03-07 11:29:00', '2018-03-07 11:30:00',
            '2018-03-07 11:31:00', '2018-03-07 11:32:00',
            '2018-03-07 11:33:00', '2018-03-07 11:34:00'],
            dtype='datetime64[ns]', name='Date', length=480, freq='T')

>>> df_ohlc.index[0]
Timestamp('2018-03-07 03:35:00', freq='T')  # and why is it Timestamp when it said ```dtype=datetime64[ns]```` right before?

But when I try to convert it, the index type(Date column) changes from datetime64[ns] to Timestamp.

>>> df_ohlc.reset_index().values
array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
        62.189998626708984, 62.16999816894531, 62.18000030517578],
    [Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
        62.18000030517578, 62.15999984741211, 62.18000030517578],
    [Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
        62.220001220703125, 62.16999816894531, 62.209999084472656],
    ..., 
    [Timestamp('2018-03-07 11:32:00'), 61.939998626708984,
        61.95000076293945, 61.93000030517578, 61.93000030517578],
    [Timestamp('2018-03-07 11:33:00'), 61.93000030517578,
        61.939998626708984, 61.900001525878906, 61.90999984741211],
    [Timestamp('2018-03-07 11:34:00'), 61.90999984741211,
        61.91999816894531, 61.900001525878906, 61.91999816894531]], dtype=object)

Why does it happen and how can I keep the type as datetime64?

I tried seperating the dataframe's index and concatenating it with the values afterwards, but it shows an error. I'd like to know what I did wrong.

>>> index_ohlc = np.array([ df_ohlc.index.values.astype('datetime64[s]'), ]).T

>>> index_ohlc.shape
(480, 1)

>>> value_ohlc = df_ohlc.values     

>>> value_ohlc.shape
(480, 4)

>>> type(index_ohlc)
<class 'numpy.ndarray'>

>>> type(value_ohlc)
<class 'numpy.ndarray'>

>>> new_array = np.concatenate( (index_ohlc, value_ohlc), axis = 1 )
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: invalid type promotion
5
  • 1
    As long as your array has mixed types (datetime as well as float), then its dtype isn't going to be anything other than objects. I'd recommend taking the index out separately from the values. Commented Mar 7, 2018 at 10:58
  • @cᴏʟᴅsᴘᴇᴇᴅ Thank you for your advice. I think I had tried what you said and got a TypeError. Do you happen to know what caused it? Commented Mar 7, 2018 at 11:08
  • 1
    No, I don't have the code that produces that error... Commented Mar 7, 2018 at 11:09
  • 1
    I did you a favor and deleted the second unrelated question after your first question. You should feel free to post it as a separate topic (it's actually easier to answer than the first question). Commented Mar 7, 2018 at 12:41
  • @John Zwinck Thank you! Commented Mar 9, 2018 at 0:16

1 Answer 1

1

Try structured_arrays.

Demo

from pandas import Timestamp
df = pd.DataFrame(np.array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
        62.189998626708984, 62.16999816894531, 62.18000030517578],
    [Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
        62.18000030517578, 62.15999984741211, 62.18000030517578],
    [Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
        62.220001220703125, 62.16999816894531, 62.209999084472656]]))
dt = np.dtype([("Date", 'datetime64[ns]'), 
               ("f1", np.float64), 
               ("f2", np.float64), 
               ("f3", np.float64), 
               ("f4", np.float64)])
arr = np.array([tuple(v) for v in df.values.tolist()], dtype=dt)

array([('2018-03-07T03:35:00.000000000', 62.18999863, 62.18999863, 62.16999817, 62.18000031),
       ('2018-03-07T03:36:00.000000000', 62.18000031, 62.18000031, 62.15999985, 62.18000031),
       ('2018-03-07T03:37:00.000000000', 62.16999817, 62.22000122, 62.16999817, 62.20999908)],
      dtype=[('Date', '<M8[ns]'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.