2
def preprocess(numerical , categorical):
    imputer = SimpleImputer()
    x_num = imputer.fit_transform(numerical)
    scaler = StandardScaler()
    x_num = scaler.fit_transform(x_num)
    one_hot = OneHotEncoder()
    x_cat = one_hot.fit_transform(categorical)
    print('X_num Shape : ' , x_num.shape)
    print('X_cat Shape : ' , x_cat.shape)
    
    return np.concatenate((x_num,x_cat),axis = 1)
[Output] X_num Shape :  (889, 2)
         X_cat Shape :  (889, 22)

The Error it shows at the end is ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 0 dimension(s)

I want the output to be of shape (889,24)

last sentence ( array at index 1 has 0 dimensions ) drives me to think that the problem is related to the weird numpy arrays of shape (n,) and (,n) but that shouldn't be a problem as dimensions are shown to not be that way but I think there's something I'm missing

I've also tried using a lot of different functions np.hstack , np.vstack , np.column_stack but they either dont give the desired output or show this error message ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 2 and the array at index 1 has size 1

7
  • 1
    check for the arg axis,try with axis = 0 Commented Jun 21, 2020 at 12:21
  • Tried it gives same error message ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 0 dimension(s) Commented Jun 21, 2020 at 12:25
  • 1
    np.concatenate((x,y),axis=1) works fine for my dummy array,check your dtypes of the arrays Commented Jun 21, 2020 at 12:29
  • 1
    Sparse matrices cannot be joined with np.concatenate. Look at np.array(X_cat).shape. There's your 1d array. Commented Jun 21, 2020 at 14:59
  • 1
    Your one-hot is producing a sparse matrix (check its default parameters). Either change that sparse setting, make the result dense, or use sparse.hstack. This is a tricky error (I've seen it a few times before), but ultimately it comes down to too casual reading of the documentation. Commented Jun 21, 2020 at 15:15

1 Answer 1

1

So as hpaulj mentioned above the problem was that the type of x_cat after coming out of OneHotEncoder was <class 'scipy.sparse.csr.csr_matrix'> instead of a numpy array which cannot be concatenated with other numpy array which results in it being cast into a numpy array when concatenating and it's dimension was () using the shape command. not sure if this means it was flattened or not but when I tried to use reshape , it didnt work and said ValueError: cannot reshape array of size 1 into shape (889,22)

What I did to solve this was replace the line one_hot = OneHotEncoder() with one_hot = OneHotEncoder(sparse = False) which makes the type of output matrix a dense numpy array which can be concatenated using np.concatenate((x_num,x_cat),axis = 1 )

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.