2

I am trying to order data and make an array for each unique ID. Data I'm using are columns of integers/floats or empty cells (NaN).

I will paste a simplified version of the code below:

import pandas as pd
import numpy as np

dtypes = {'starttime': 'str', 'endtime': 'str', 'hr': 'float', 'sofa_24hours': 'float'}
parse_dates = [2,3]
fields = [0,1,11,12,13,14,15,34,35,36]
reader = pd.read_csv(filename, header=0, names=headers, dtype=dtypes, parse_dates=parse_dates, usecols=fields)
print("Started loading data...")

df = pd.DataFrame(data=reader)
ids = list(df.iloc[:, 0])
id_list = np.unique(ids)
x = df.iloc[:, 2:6].astype(float)
y = df.iloc[:, 7].astype(float)
t = df.iloc[:, 0].astype(float)

x_data = []
y_data = []
t_data = []

for i in range(0,len(id_list)):
    idx = np.where(ids==id_list[i])[0]
    t_data.append(t.values[idx[0]:idx[-1]+1])
    x_data.append(x.values[idx[0]:idx[-1]+1,:])
    y_data.append(y.values[idx[0]:idx[-1]+1])

    if np.mod(i,1000)==0:
        print("Data association... {}%".format(np.round(100*i/len(id_list))))

print("Finished loading data!")

Now, when I check for the type:

In [1]: y.dtype
Out[1]: dtype('float64')

That seems about right. Then I cut the data into batches using:

batch_size=64
W=5

idx_pt = np.random.randint(W,len(x_data),batch_size)
idx_t = [np.random.randint(0,len(x_data[i])-W-1) for i in idx_pt]

batch_x = np.array([x_data[idx_pt[i]][idx_t[i]:idx_t[i]+W,:] for i in range(0,len(idx_pt))])
batch_y = np.array([y_data[idx_pt[i]] for i in range(0,len(idx_pt))])

When I check for dtype:

In [2]: batch_x.dtype
Out[2]: dtype('float64')

In [3]: batch_y.dtype
Out[3]: dtype('O')

Why is batch_y treated as an object?

1 Answer 1

1

I suppose the last array (batch_y) was created from a list which contains numpy arrays of different length.

I don't have your data, but the following code produces both batch_x and batch_y as object arrays:

import numpy as np

x= np.random.randint(0, high = 10, size=[300,300])
y = np.array(np.random.randint(0, high = 10, size=300), dtype=np.float64)


id_list = np.random.randint(0, high = 10, size=20)
ids = id_list

x_data = []
y_data = []

for i in range(0,len(id_list)):
    idx = np.where(ids==id_list[i])[0]
    x_data.append(x[idx[0]:idx[-1]+1,:])
    y_data.append(y[idx[0]:idx[-1]+1])


batch_size=64
W=5

idx_pt = np.random.randint(W,abs(len(x_data)),batch_size)
idx_t = [np.random.randint(0,abs(len(x_data[i])-W-1)) for i in idx_pt]

batch_x = np.array([x_data[idx_pt[i]][idx_t[i]:idx_t[i]+W,:] for i in range(0,len(idx_pt))])
batch_y = np.array([y_data[idx_pt[i]] for i in range(0,len(idx_pt))])

The reason is that y_data already contains arrays of different length:

>>> y_data[0]
array([0., 9., 9., 8., 2., 1., 7., 7., 8., 0.])
>>> y_data[1]
array([9., 9., 8., 2., 1., 7., 7., 8.])
>>> y_data[3]
array([8., 2., 1., 7., 7.])
>>> y_data[4]
array([2., 1., 7., 7., 8., 0., 0., 1.])

Please check your input dataframe and what you are actually putting into x_data and y_data.

Sign up to request clarification or add additional context in comments.

4 Comments

Correct, y_data contains arrays of different lengths. However, the length is the same as x_data: len(y_data[0]) = 74, len(x_data[1]) = 143, len(x_data[0]) = 74, len(x_data[1])= 143
Then batch_y will have object dtype. You can check dtype of np.array([[1,2,3],[1,2,3]]) and np.array([[1,2,3],[1,2]]) , First one will be int, the second will be object.
For me personally it's hard to guess without having the input data (original dataframe), but if you indeed are creating arrays from chunks of different length, then the real question is why batch_x is still float.
I figured it out. It turns out I didn't cut data_y in pieces of equal length w. This is the working code: batch_y = np.array([y_data[idx_pt[i]][idx_t[i]+W] for i in range(0,len(idx_pt))]). dtype = float64

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.