3

Why is the shape of a single row numpy structured array not defined ( '()') and whats the common "workaround"?

import io
fileWrapper = io.StringIO("-0.09469 0.032987 0.061009 0.0588")

a =np.loadtxt(fileWrapper,dtype=np.dtype([('min', (float,2) ), ('max',(float,2) )]), delimiter= " ", comments="#");
print(np.shape(a), a)

Output: () ([-0.09469, 0.032987], [0.061009, 0.0588])

4
  • whats the result of print a after a =np.loadtxt Commented Apr 14, 2015 at 13:55
  • added the output above Commented Apr 14, 2015 at 13:57
  • this some how inconsistent wrong behaviour makes code syntax mad, i.e. distinguish between single row arrays or bigger ones Commented Apr 14, 2015 at 13:58
  • () is shape information - for a single element 0d array. Commented Apr 14, 2015 at 20:33

1 Answer 1

3

Short answer: Add the argument ndmin=1 to the loadtxt call.

Long answer:

The shape is () for the same reason that reading a single floating point value with loadtxt returns an array with shape ():

In [43]: a = np.loadtxt(['1.0'])

In [44]: a.shape
Out[44]: ()

In [45]: a
Out[45]: array(1.0)

By default, loadtxt uses the squeeze function to eliminate trivial (i.e. length 1) dimensions in the array that it returns. In my example above, it means the result is a "scalar array"--an array with shape ().

When you give loadtxt a structured dtype, the structure defines the fields of a single element of the array. It is common to think of these fields as "columns", but structured arrays will make more sense if you consistently think of them as what they are: arrays of structures with fields. If your data file had two lines, the array returned by loadtxt would be an array with shape (2,). That is, it is a one-dimensional array with length 2. Each element of the array is a structure whose fields are defined by the given dtype. When the input file has only a single line, the array would have shape (1,), but loadtxt squeezes that to be a scalar array with shape ().

To force loadtxt to always return a one-dimensional array, even when there is a single line of data, use the argument ndmin=1.

For example, here's a dtype for a structured array:

In [58]: dt = np.dtype([('x', np.float64), ('y', np.float64)])

Read one line using that dtype. The result has shape ():

In [59]: a = np.loadtxt(['1.0 2.0'], dtype=dt)

In [60]: a.shape
Out[60]: ()

Use ndmin=1 to ensure that even an input with a single line results in a one-dimensional array:

In [61]: a = np.loadtxt(['1.0 2.0'], dtype=dt, ndmin=1)

In [62]: a.shape
Out[62]: (1,)

In [63]: a
Out[63]: 
array([(1.0, 2.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])
Sign up to request clarification or add additional context in comments.

1 Comment

very concise answer!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.