I am having a python list, and each element is a 2d Numpy array with size of (20, 22). I need to convert the list to a numpy array but doing np.array(my_list) is literally eating the RAM, so does np.asarray(my_list).
The list has around 7M samples, I was thinking instead of converting my list to a numpy array, let me start with a numpy array and keep appending another 2d numpy arrays.
I cant find a way of doing that using numpy, my aim is to start with something like that:
numpy_array = np.array([])
df_values = df.to_numpy() # faster than df.values
for x in df_values:
if condition:
start_point += 20
end_point += 20
features = df_values[start_point:end_point] # 20 rows, 22 columns
np.append(numpy_array, features)
As you can see above, after each loop, the size of numpy_array should be changing to something like this:
first iteration: (1, 20, 22)
second iteration: (2, 20, 22)
third iteration: (3, 20, 22)
N iteration: (N, 20, 22)
Update:
Here is my full code,
def get_X(df_values):
x = [] #np.array([], dtype=np.object)
y = [] # np.array([], dtype=int32)
counter = 0
start_point = 20
previous_ticker = None
index = 0
time_1 = time.time()
df_length = len(df_values)
for row in tqdm(df_values):
if 0 <= start_point < df_length:
ticker = df_values[start_point][0]
flag = row[30]
if index == 0: previous_ticker = ticker
if ticker != previous_ticker:
counter += 20
start_point += 20
previous_ticker = ticker
features = df_values[counter:start_point]
x.append(features)
y.append(flag)
# np.append(x, features)
# np.append(y, flag)
counter += 1
start_point += 1
index += 1
else:
break
print("Time to finish the loop", time.time()-time_1)
return x, y
x, y = get_X(df.to_numpy())
featuresalways the same array?start_point, andend_point