Fast ascii loader to NumPy arrays

Question

It is well known [1] [2] that numpy.loadtxt is not particularly fast in loading simple text files containing numbers.

I have been googling around for alternatives, and of course I stumbled across pandas.read_csv and astropy io.ascii. However, these readers don’t appear to be easy to decouple from their library, and I’d like to avoid adding a 200 MB, 5-seconds-import-time gorilla just for reading some ascii files.

The files I usually read are simple, no missing data, no malformed rows, no NaNs, floating point only, space or comma separated. But I need numpy arrays as output.

Does anyone know if any of the parsers above can be used standalone or about any other quick parser I could use?

Thank you in advance.

[1] Numpy loading csv TOO slow compared to Matlab

[2] http://wesmckinney.com/blog/a-new-high-performance-memory-efficient-file-parser-engine-for-pandas/

[Edit 1]

For the sake of clarity and to reduce background noise: as I stated at the beginning, my ascii files contain simple floats, no scientific notation, no fortran specific data, no funny stuff, no nothing but simple floats.

Sample:

{

arr = np.random.rand(1000,100)
np.savetxt('float.csv',arr)

}

Similar current question, stackoverflow.com/questions/52232559/…. Not a duplicate since it doesn't have an answer either. — hpaulj
– hpaulj, Commented Sep 8, 2018 at 19:32
If import times are an issue, I'm wondering if you save some by just pulling in the relevant parts of pandas.io to avoid grabbing the full API. — fuglede
– fuglede, Commented Sep 8, 2018 at 22:07
@hjpauli, it varies wildly, I have a few files containing data that is around 30x3, many others up to 10,000x9. — Infinity77
– Infinity77, Commented Sep 9, 2018 at 5:03

Christoph · Accepted Answer · 2018-09-08 20:34:43Z

0

Personally I just use pandas and astropy for this. Yes, they are big and slow on import, but very widely available and on my machine import in under a second, so they aren't so bad. I haven't tried, but I would assume that extracting the CSV reader from pandas or astropy and getting it to build and run standalone isn't so easy, probably not a good way to go.

Is writing your own CSV to Numpy array reader an option? If the CSV is simple, it should be possible to do with ~ 100 lines of e.g. C / Cython, and if you know your CSV format you can get performance and package size that can't be beaten by a generic solution.

Another option you could look at is https://odo.readthedocs.io/ . I don't have experience with it, from a quick look I didn't see direct CSV -> Numpy. But it does make fast CSV -> database simple, and I'm sure there are fast database -> Numpy array options. So it might be possible to get fast e.g. CSV -> in-memory SQLite -> Numpy array via odo and possible a second package.

answered Sep 8, 2018 at 20:34

Christoph

3,0002 gold badges23 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Infinity77 Over a year ago

Thank you for the suggestions. It seems odo uses pandas under the hood, so back to square one...

Collectives™ on Stack Overflow

Fast ascii loader to NumPy arrays

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related