4

I want to have a numpy array with values and corresponding labels for each value. I am using this array for linear regression and it will be my X data vector in the equation y = Xb + error.

My X vector consists of about 20 variables, each of which I would like to be able to reference by name like so X['variable1']. I was initially using a dictionary to do this but realized that the scikit library for linear regression requires a numpy matrix, so I am trying to build a numpy array that is labeled.

I keep getting an error stating:

TypeError: a bytes-like object is required, not 'int'.

This is what I'm doing:

X = np.array([3],dtype=[('label1','int')])

I eventually want to have 20 labeled values, something like this:

X = np.array([3,40,7,2,...],
             dtype=[('label1',int'),('label2','int'),('label3','int')...])

Would really appreciate any help on the syntax here. Thanks!

2 Answers 2

5

The correct way to create a structured array, with values, is with a list of tuples:

In [55]: X
Out[55]: 
array([(3,)], 
      dtype=[('label1', '<i4')])

In [56]: X=np.array([(3,4)],dtype=[('label1',int),('label2',int)])

In [57]: X
Out[57]: 
array([(3, 4)], 
      dtype=[('label1', '<i4'), ('label2', '<i4')])

But I should caution you that such array is not 2d (or matrix), it is 1d with fields:

In [58]: X.shape
Out[58]: (1,)

In [59]: X.dtype
Out[59]: dtype([('label1', '<i4'), ('label2', '<i4')])

And you can't do math across fields; X*2 and X.sum() will produce errors. Using X in an equation like y = X*b + error will be hopeless.

You are probably better off working with real 2d numeric arrays, and do the mapping between labels and column numbers in your head, or with a dictionary.

Or use Pandas.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I went with your first approach and did something like: keyValues = [('A',0), ('R',0), ('N',0)]
0

Since with 20 variables, memory is not an issue, you could keep on using dictionaries:

from collections import OrderedDict  # Dictionary that remembers insertion order
import numpy as np

dd = OrderedDict()
dd["Var1"] = 10
dd["Var2"] = 20
dd["Var3"] = 30

# make numpy array from dict:
xx = np.array([v for v in dd.values()])  

# make dict() from array:
xx2 = 2*xx
dd2 = OrderedDict((k, v) for (k,v) in zip(dd.keys(), xx2))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.