0

I can't post the data being imported, because it's too much. But, it has both number and string fields and is 5543 rows and 137 columns. I import data with this code (ndnames and ndtypes holds the column names and column datatypes):

npArray2 = np.genfromtxt(fileName, 
                        delimiter="|", 
                        skip_header=1, 
                        dtype=(ndtypes), 
                        names=ndnames, 
                        usecols=np.arange(0,137)
                        )

This works and the resulting variable type is "void7520" with size (5543,). But this is really a 1D array of 5543 rows, where each element holds a sub-array that has 137 elements. I want to convert this into a normal numpy array of 5543 rows and 137 columns. How can this be done?

I have tried the following (using Pandas):

pdArray = pd.read_csv(fileName, 
                      sep=ndelimiter,
                      index_col=False, 
                      skiprows=1,
                      names=ndnames
                      )
npArray = pd.DataFrame.as_matrix(pdArray)

But, the resulting npArray is type Object with size (5543,137) which, at first, looks promising. But, because it's type Object, there are other functions that can't be performed on it. Can this Object array be converted into a normal numpy array?

Edit: ndtypes look like... [int,int,...,int,'|U50',int,...,int,'|U50',int,...,int] That is, 135 number fields with two string-type fields in the middle somewhere.

2
  • For the read_csv sectioin, it looks like what you want is to read data to a specific data type(int, float, etc) instead of Object. Have you tried using 'dtype' parameter of read_csv function? Commented Nov 4, 2016 at 20:08
  • @Peng - see my edit above. Essentially, it is 137 fields, where 135 are numbers and 2 are |U50 fields. Commented Nov 4, 2016 at 20:57

1 Answer 1

0

npArray2 is a 1d structured array, 5543 elements and 137 fields.

What does npArray2.dtype look like, or equivalently what is ndtypes, because the dtype is built from the types and names that you provided. "void7520" is a way of identifying a record of this array, but tells us little except the size (in bytes?).

If all fields of the dtype are numeric, even better yet if they are all the same numeric dtype (int, float), then it is fairly easy to convert it to a 2d array with 137 columns (2nd dim). astype and view can be used.

(edit - it has both number and string fields - you can't convert it to a 2d array of numbers; it could be an array of strings, but you can't do numeric math on strings.)

But if the dtypes are mixed then you can't convert it. All elements of the 2d array have be the same dtype. You have to use the structured array approach if you want mixed types. (well there is the dtype=object, but let's not go there).

Actually pandas is going the object route. Evidently it thinks the only way to make an array from this data is to let each element be its own type. And the math of object arrays is severely limited. They are, in effect a glorified, or debased, list.

Sign up to request clarification or add additional context in comments.

3 Comments

Try loading as int without the string columns.
I have to load the string columns, too, though. I would try to split the load into two arrays, but can't because of the issues I mentioned. Here's a question: if the datatype of the array is Object and the fields are numeric (with sum NaN's), how can that be converted to a regular array?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.