0

My data file looks like this:

enter image description here

I want to load this data into a numpy array. How do I do that?

If I use loadtxt(filename), it gives the error:

raise ValueError(errmsg)
ValueError: Some errors were detected !

If I use genfromtxt(filename, delimiter=" "), it gives the same error, even though this was supposed to fix it.

If I use the following:

from array import array
N=84 # max number of columns in any row in the data file
with open('C:/Users/hp1/Desktop/ClusterAnalysis/hierarchical_result.txt',"r") as f:
        all_data=[x.split() for x in f.readlines()]
        a=array([map(int,x) for x in all_data[:N]])

I get this error:

TypeError: array() argument 1 must be a unicode character, not list

EDIT: This is all of the data in the data file:

61 81
2 28
13 31
59 64
36 63
45 58
3 73
47 51
33 68
1 72
12 84
3 73 12 84
1 72 3 73 12 84
6 83
27 42
66 6 83
54 77
60 54 77
39 40
10 19
49 79
22 76
61 81 60 54 77
65 61 81 60 54 77
8 65 61 81 60 54 77
66 6 83 8 65 61 81 60 54 77
71 47 51
18 25
59 64 18 25
32 59 64 18 25
11 34
20 26
27 42 20 26
69 27 42 20 26
16 62
43 16 62
30 45 58
85 30 45 58
56 85 30 45 58
17 11 34
22 76 32 59 64 18 25
29 39 40
14 57
44 14 57
7 24
78 2 28
15 37
70 15 37
48 70 15 37
80 29 39 40
4 9
75 43 16 62
13 31 75 43 16 62
74 13 31 75 43 16 62
36 63 17 11 34
53 36 63 17 11 34
46 1 72 3 73 12 84
23 52
38 66 6 83 8 65 61 81 60 54 77
82 38 66 6 83 8 65 61 81 60 54 77
10 19 56 85 30 45 58
33 68 10 19 56 85 30 45 58
5 49 79
78 2 28 4 9
55 80 29 39 40
67 55 80 29 39 40
7 24 67 55 80 29 39 40
35 48 70 15 37
69 27 42 20 26 35 48 70 15 37
41 82 38 66 6 83 8 65 61 81 60 54 77
50 69 27 42 20 26 35 48 70 15 37
33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37
46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
78 2 28 4 9 53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
74 13 31 75 43 16 62 78 2 28 4 9 53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
44 14 57 74 13 31 75 43 16 62 78 2 28 4 9 53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
5 49 79 44 14 57 74 13 31 75 43 16 62 78 2 28 4 9 53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
71 47 51 5 49 79 44 14 57 74 13 31 75 43 16 62 78 2 28 4 9 53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
23 52 71 47 51 5 49 79 44 14 57 74 13 31 75 43 16 62 78 2 28 4 9 53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
21 23 52 71 47 51 5 49 79 44 14 57 74 13 31 75 43 16 62 78 2 28 4 9 53 36 63 17 11 34 7 24 67 55 80 29 39 40 50 69 27 42 20 26 35 48 70 15 37 22 76 32 59 64 18 25 46 1 72 3 73 12 84 33 68 10 19 56 85 30 45 58 41 82 38 66 6 83 8 65 61 81 60 54 77
9
  • 2
    What exactly are you expecting? What should the shape of the array be? Numpy has true multidimensional arrays, and does not support jagged arrays. EDIT: well, they do support jagged arrays if you use dtype=object, but you essentially lose most of the nice functionality of `numpy1 Commented Jun 26, 2017 at 7:43
  • Look at all_data to make sure it is all strings that can be converted to int, then try x = [int(i) for i in all_data]. Is that all numbers? Then np.array(x) should work, producing a 1d array of integers. Commented Jun 26, 2017 at 7:49
  • @juanpa.arrivillaga I don't know, programming is not exactly my thing. I am just having to use it for this particular work. Commented Jun 26, 2017 at 7:50
  • @Kristada673 well you need to specify the output you are expecting, or else how could we help you? Commented Jun 26, 2017 at 7:51
  • @hpaulj I tried np.array(x), didn't work Commented Jun 26, 2017 at 7:51

4 Answers 4

1

If you want to pad each row with the max number of columns, you have to implement it yourself. Something to the effect:

import numpy as np

def pad_list(lst, padding, default=0):
    return lst + (padding - len(lst))*[default]

N = 84 # max number of columns in any row in the data file
with open('/path/to/file',"r") as f:
        all_data=(map(int, x.split()) for x in f)
        a = np.array([pad_list(list(x), N) for x in all_data])

However, for this give you a numeric instead of object type array, you need to know the actual maximum number of columns. So be careful with figuring that out.

Sign up to request clarification or add additional context in comments.

3 Comments

No, if you count the number of numbers in the last line of the data (which contains the max number of columns), you would see that its 85. (My mistake writing N=84, it should be N=85; but not N=165)
@Kristada673 I may have made an error, but in any event, if you have the correct number it should work
@Kristada673 I would consider using the zip_longest approach. If only because it is more elegant, but it's probably more efficient too.
1

numpy.genfromtxt does not handle variable-length rows. You should parse you txt by yourself.

No need to use array as following in Python 3.x

import numpy as np
N = 84 # max number of columns in any row in the data file
with open('C:/Users/hp1/Desktop/ClusterAnalysis/hierarchical_result.txt',"r") as f:
        all_data = [x.split() for x in f.readlines()]
        output = np.array([list(map(int,x))[:N] for x in all_data])

3 Comments

This gives the following error: raise ValueError('A 2-dimensional array must be passed.'). ValueError: A 2-dimensional array must be passed.
@Kristada673 My env has no problem. Do you mind post the all_data format?
@Kristada673 My env has no problem...paste an try again, print the all_data variable and see what'w wrong...
1

I have used pandas for that problem, where you can specify the desired columns. If a columns has fewer columns, they will be set to NaN. You have to know the maximum number of columns, but that is easily detected using readlines, split and a list comprehension.

Comments

1
In [306]: with open('stack44755004.txt') as f:
     ...:     lines = f.readlines()
     ...:     
In [307]: strs = [line.split() for line in lines]
In [308]: strs
Out[308]: [['61', '81'], ['2', '28'], ['13', '31'], ['3', '73', '12', '84'], ['6', '83']]
In [309]: nums = [[int(i) for i in line.split()]for line in lines]
In [310]: nums
Out[310]: [[61, 81], [2, 28], [13, 31], [3, 73, 12, 84], [6, 83]]

nums is a list of lists of numbers. Can't make that into a 2d array of numbers.

But with a plain read I get a string with newlines:

In [311]: with open('stack44755004.txt') as f:
     ...:     alldata = f.read()

In [312]: alldata
Out[312]: '61 81\n2 28\n13 31\n3 73 12 84\n6 83\n'

split treats that like space, so I get a list of strings:

In [313]: alldata.split()
Out[313]: ['61', '81', '2', '28', '13', '31', '3', '73', '12', '84', '6', '83']

np.array can convert that to an array of integers

In [314]: np.array(alldata.split(),int)
Out[314]: array([61, 81,  2, 28, 13, 31,  3, 73, 12, 84,  6, 83])

This method looses all the line information. Is that important?

There are ways of turning nums into an array. For example it could be written into a zero padded array. But if you don't know what you want, I'm not sure that's worth the trouble.


There have been various padding questions. One tool that I recall off to top of my head is itertools.zip_longest (Python3 version):

Out[317]: <itertools.zip_longest at 0xa9c46194>
In [318]: list(itertools.zip_longest(*nums, fillvalue=0))
Out[318]: [(61, 2, 13, 3, 6), (81, 28, 31, 73, 83), (0, 0, 0, 12, 0), (0, 0, 0, 84, 0)]
In [319]: np.array(_)
Out[319]: 
array([[61,  2, 13,  3,  6],
       [81, 28, 31, 73, 83],
       [ 0,  0,  0, 12,  0],
       [ 0,  0,  0, 84,  0]])
In [320]: _.T
Out[320]: 
array([[61, 81,  0,  0],
       [ 2, 28,  0,  0],
       [13, 31,  0,  0],
       [ 3, 73, 12, 84],
       [ 6, 83,  0,  0]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.