I have a dataframe with 1000 rows and 1000 columns. I am trying to generate an numpy array from that dataframe using a for loop, I use the for loop to randomly select 5 columns per cycle. I need to append or concatenate each array (1000 rows and 5 columns) that I generate per cycle. However, it seen that is not possible to create an numpy array without specifying first the dimensions.
I have tried the following code:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.choice([0.0, 0.05], size=(1000,1000)))
l = np.array([])
for i in range(0,100):
rand_cols = np.random.permutation(df.columns)[0:5]
df2 = df[rand_cols].copy()
l = np.append(l, df2, axis=0)
However, I get the following error:
ValueError: all the input arrays must have same number of
dimensions
This code summarize what I am doing, however, according to this example, the outcome that I need is an array of 1000 rows and 500 columns, that is generated with the concatenation of each of the array I generate with each for loop cycle.