2

I know Matlab has some nice syntax where you can put into a file array definitions, like A = [[1,2,3],..., and then you can import that file and all those definitions are read automatically. I would like to do something similar in Python. Basically I'm looking for the easiest way to read tabular data from a file, and having the resulting object as numpy array instances. What's the easiest way to accomplish this? (or the most Pythonic way?)

Say the data in the file is as follows:

Array1
1 0 0 0
2 1 0 0
3 0.3333333333325028 0 0
4 0.6666666666657888 0 0

Array2
1 1 1 1
2 3 1 1
3 2 2 2
4 3 2 2
5 1 1 3
6 1 3 4
7 1 4 2
1
  • The csv format is convenient when all rows have the same number of columns, and you want one array (or table). But with multiple arrays like this the csv format is awkward. Commented Jun 30, 2016 at 20:50

3 Answers 3

2

file test1.py:

#!/usr/bin/python
a=[1,2,3,4,5,6]

file test.py:

#!/usr/bin/python

import test1

print test1.a

Now if you run test.py:

$ ./test.py
[1, 2, 3, 4, 5, 6]
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for answering. I've been looking at this way, but I don't like the fact that the data is in a module file. Is there another way? Also, if the data is not properly formatted, I would like to be able to read it anyways without breaking it.
This is the easiest way AFAIK, if you need additional features like a custom text file with custom formats and/or allow loosely typed data, then it won't be the easiest any more
It may be the easiest but it's also quite dangerous isn't it? Anyone could just write bad code there that will be executed.
yes, it's dangerous, you shouldn't use it unless you have complete control over the data
I don't unfortunately, so I can't use your approach. Do you know if I can use pandas for this?
|
2

What Jahid said below works well if you want to put your data in Python modules.

If on the other hand you'd rather put your data in a separate file, e.g. a text file, and then read it in a script, you may want to use numpy.loadtxt (it's designed to automatically read matrix-like files into numpy arrays).

http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

4 Comments

I found this article where this function is not recommended. What about Pandas?
How are you going to use the data once it's in your workspace? If Pandas dataframes and series make sense to you, then use the Pandas loader. But if all you are using is numpy, then stick with loadtxt (or genfromtxt). The only reason that article gave for using Pandas is speed.
I basically need numpy arrays after I read the data, though the arrays have different types. I'll try with those functions again.
@aaragon: pandas will deal much better with complex data, data with errors, data with comments, data with dates, with larger files, etc. loadtxt is good for smaller amounts of very simple data, but pandas is better if you have anything remotely complex or if you have very large amounts of data.
0

What you probably want is to put your data in the yaml file format. It is a text data format whose structure is based on higher-level scripting languages like Python. You can put multiple 2D arrays of arbitrary types in it. However, since it is just data, not code, it isn't as dangerous as putting the data directly in a Python script. It can pretty easily make 2D arrays, or more strictly nested lists (look at example 2.5 at that link specifically), as well as the equivalent of ordinary lists, dicts, nested dicts, strings, and any combination thereof. Since you can nest one data type in another, you can have a dictionary of 2D arrays, for example, which lets you put multiple arrays in a single file.

Here is your example in yaml:

Array1:
- [1, 0, 0, 0]
- [2, 1, 0, 0]
- [3, 0.3333333333325028, 0, 0]
- [4, 0.6666666666657888, 0, 0]

Array2:
- [1, 1, 1, 1]
- [2, 3, 1, 1]
- [3, 2, 2, 2]
- [4, 3, 2, 2]
- [5, 1, 1, 3]
- [6, 1, 3, 4]
- [7, 1, 4, 2]

And here is how to read it into numpy arrays (the file is called "temp.yaml" in my example), using the PyYaml package:

>>> import yaml
>>>
>>> with open('temp.yaml') as ym:
....    res = yaml.load(ym)
>>> res
{'Array1': [[1, 0, 0, 0],
  [2, 1, 0, 0],
  [3, 0.3333333333325028, 0, 0],
  [4, 0.6666666666657888, 0, 0]],
'Array2': [[1, 1, 1, 1],
  [2, 3, 1, 1],
  [3, 2, 2, 2],
  [4, 3, 2, 2],
  [5, 1, 1, 3],
  [6, 1, 3, 4],
  [7, 1, 4, 2]]}
>>> array1 = np.array(res['Array1'])
>>> array2 = np.array(res['Array2'])
>>> print(array1)
[[ 1.          0.          0.          0.        ]
 [ 2.          1.          0.          0.        ]
 [ 3.          0.33333333  0.          0.        ]
 [ 4.          0.66666667  0.          0.        ]]
>>> print(array2)
[[1 1 1 1]
 [2 3 1 1]
 [3 2 2 2]
 [4 3 2 2]
 [5 1 1 3]
 [6 1 3 4]
 [7 1 4 2]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.