I have a table where the columns are ['datetime', 'sensorid', 'sms-in', 'sms-out', 'call-in', 'call-out'], and there are 10,000 sensors in total. Ideally, there will be 10,000 rows for each timestamp. However, there may be missing rows of sensors for some timestamps (e.g., only 9998 rows).
The table may look like
sms-in sms-out call-in call-out
datetime sensorid
2013-10-31 23:00:00 1 0.223227 0.156787 0.160938 0.052275
2 0.222201 0.147617 0.164946 0.054712
3 0.221109 0.137855 0.169213 0.057306
4 0.226198 0.183349 0.149327 0.045216
5 0.205065 0.175393 0.139139 0.043455
... ... ... ... ...
2013-11-01 22:50:00 9996 0.695404 0.440369 0.087566 0.310581
9997 0.687958 0.429974 0.085995 0.243143
9998 0.687958 0.429974 0.085995 0.256862
9999 0.894907 0.518741 0.085995 0.230476
10000 1.212911 0.638219 0.085995 0.090769
[1439982 rows x 4 columns]
Let the last 4 columns ['sms-in', 'sms-out', 'call-in', 'call-out'] be the features of a sensor. Let T and N represent the timestamp and sensorid axies, respectively.
How do I convert the DataFrame into a numpy array with the shape of (T, N, 4)? I tried a very trival way to iteratively collect the rows, which is very inefficient. Is there any Pandas API or concise way to do a work like that?