0

I have 2 arrays to concatenate:

X_train's shape is (3072, 50000) y_train's shape is (50000,)

I'd like to concatenate them so I can shuffle the indices all in one go. I have tried the following, but neither works:

np.concatenate([X_train, np.transpose(y_train)])
np.column_stack([X_train, np.transpose(y_train)])

How can I concatenate them?

6
  • Concatenate to what? You got input-dimensions, what output-dimension do you want? (from a ML-perspective i don't see this making sense) Commented Feb 5, 2018 at 16:37
  • 2
    Can't you just reshape Y_train to (1,50000)? Commented Feb 5, 2018 at 16:38
  • @DavidG Yes, thanks! Btw, why do I get (50000,) in the first place? Is that a numpy array? Seems like it's some kind of vector or list, idk. I'm new to numpy Commented Feb 5, 2018 at 16:41
  • 2
    This post might help with the difference between the two Commented Feb 5, 2018 at 16:45
  • 1
    In numpy 1-d arrays are just as useful as 2-d (or higher). Commented Feb 5, 2018 at 17:06

2 Answers 2

2

To give you some recommendation targeting the task, not your problem: don't do this!

Assuming X are your samples / observations, y are your targets:

Just generate a random-permutation and create views (nothing copied or modified) into those, e.g. (untested):

import numpy as np

X = np.random.random(size=(50000, 3072))
y = np.random.random(size=50000)

perm = np.random.permutation(X.shape[0])  # assuming X.shape[0] == y.shape[0]
X_perm = X[perm]  # views!!!
y_perm = y[perm]

Reminder: your start-shapes are not compatible to most python-based ml-tools as the usual interpretation is:

  • first-dim / rows: samples
  • second-dim / cols: features

As #samples need to be the same as #target-values y, you will see that my example is correct in regards to this, while yours need a transpose on X

Sign up to request clarification or add additional context in comments.

Comments

0

As DavidG said, I realized the answer is that y_train has shape (50000,) so I needed to reshape it before concat-ing

np.concatenate([X_train,         
     np.reshape(y_train, (1, 50000))])

Still, this evaluated very slowly in Jupyter. If there's a faster answer, I'd be grateful to have it

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.