I am trying to create 3 tensors for my language translation LSTM network.
import numpy as np
Num_samples=50000
Time_step=100
Vocabulary=5000
shape = (Num_samples,Time_step,Vocabulary)
encoder_input_data = np.zeros(shape,dtype='float32')
decoder_input_data = np.zeros(shape,dtype='float32')
decoder_target_data = np.zeros(shape,dtype='float32')
Obviously, my machine doesn't have enough memory to do so. Since the data is represented as one-hot vectors, it seems using the function csc_matrix() from scipy.sparse will be the solution, as suggested in this tread and this tread.
But after trying the csc_matrix() and crc_matrix(), it seems they only support 2D array.
Old treads from 6 years ago did talk about this issue, but they are not machine learning orientated.
My question is: Is there any python lib/tool that can help me to create sparse 3D arrays that allows me to store one-hot vectors for machine learning purpose later?
scipy.sparsematrices are used in learning, such as in thesklearnpackage. But, no they have not been expanded to 3d.scipy.sparsematrices? Since all of my machine learning experience are creating 3D array in the very beginning which contains batch, time_step, and length of the one-hot vector. I am open to new ways to deal with the training data.