Very large array handling with astropy and numpy

Question

For some reasons I need to use astropy to convert comoving distance to redshift. Basically this involves reading in, looping through and writing out a list or a numpy array... My problem is that each of my lists typically consists of ~ 9.5 x 10^6 elements. And this gives me the MemoryError every time I try to save the output into a new txt file using numpy.savetxt. The memory usage quickly grows, eventually slows down a bit, but always gets higher the 128Gb limit I have.

If anybody has any idea how I can improve the script below, I am very willing to listen. Thank you!

import os
import sys
import glob
import math
import numpy
import astropy
import astropy.units as unit
from astropy.cosmology import *
cosmo = FlatLambdaCDM(H0=70, Om0=0.3)

inFile=sys.argv[1]
outFile=sys.argv[2]

comovingDistance = numpy.loadtxt(inFile, usecols=(2,))

Redshift = numpy.zeros(len(comovingDistance)) 
for i in range(len(comovingDistance)):
    Redshift[i] = z_at_value(cosmo.comoving_distance, comovingDistance[i] * unit.kpc)

output = open(outFile,'w')
numpy.savetxt(output, Redshift, fmt='%1.8e')
output.close()

Below is the error log file:

Traceback (most recent call last):
  File "comoving2redshift.py", line 21, in <module>
    Redshift[i] = z_at_value(cosmo.comoving_distance, comovingDistance[i] * unit.kpc)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/funcs.py", line 119, in z_at_value
    fval_zmax = func(zmax)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/core.py", line 1195, in comoving_distance
    return self._comoving_distance_z1z2(0, z)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/cosmology/core.py", line 1219, in _comoving_distance_z1z2
    return self._hubble_distance * vectorize_if_needed(f, z1, z2)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/units/quantity.py", line 924, in __mul__
    return super(Quantity, self).__mul__(other)
  File "/afs/mpa/home/minh/.local/lib/python2.7/site-packages/astropy/units/quantity.py", line 368, in __array_prepare__
    from .quantity_helper import UNSUPPORTED_UFUNCS, UFUNC_HELPERS
MemoryError

The note in the docs of z_at_value seems relevant to your case, you may want to try that approach. Also, have you tried to see if z_at_value is vectorized? It is very likely that you don't need to iterate your self and can simply do Redshift = z_at_value(cosmo.comoving_distance, comovingDistance * unit.kpc) without needing to allocate the Redshift array either. — Jaime
– Jaime, Commented Apr 25, 2017 at 13:30
@Evert Thanks! I thought about it too and that was in fact what I did since there seems to be not a more clever way. — Minh N
– Minh N, Commented Apr 25, 2017 at 23:28
@Jaime Thank you for the notice about the astropy document. I had read it very quickly and missed the part about computing many quantities using the same cosmology. Interpolation saves the day indeed. — Minh N
– Minh N, Commented Apr 26, 2017 at 17:30

user707650 · Accepted Answer · 2017-04-26 02:03:16Z

1

I don't know any solution intrinsic to numpy, but you can save some memory allocation by writing each solution promptly to file, and not after the for loop. That saves the memory allocation for Redshift and the memory allocation done behind the scenes when numpy.savetxt() formats floating points to string.

inFile=sys.argv[1]
outFile=sys.argv[2]

comovingDistance = numpy.loadtxt(inFile, usecols=(2,))

with open(outFile, 'w') as fp:
    for distance in comovingDistance:
        fp.write("{:1.8e}\n".format(
            z_at_value(cosmo.comoving_distance, distance * unit.kpc)))

(NB: untested)

answered Apr 26, 2017 at 2:03

user707650

Sign up to request clarification or add additional context in comments.

Comments

user707650 · Accepted Answer · 2017-04-26 02:16:40Z

As an alternative to my other suggested solution, you can split the input file, iterate over the new set of (temporary) input files, and concatenate the input files at the end. Below is a bash wrapper script that, on the outside, should work identically to the Python script in the question (one input file argument, one output file argument).

#! /bin/bash                                                                                   

nlines=10000                                                                                   
input=$1                                                                                       
output=$2                                                                                      

# use a unique prefix!                                                                         
prefix='tmpsplit'                                                                              
split --lines=$nlines $input $prefix                                                           

outfiles=()                                                                                    
# Assume we only split to a maximum of 26^2 files
# This is the default for split anyway                                              
for filename in ${prefix}??                                                                    
do                                                                                             
        outfile="${filename}-out"                                                              
        ./calcdist.py $filename $outfile                                                      
done                                                                                           

# This assumes the shells orders the glob expansion alphabetically                             
cat ${prefix}*out > $output                                                                    

# Clean up                                                                                     
rm ${prefix}*

You may want to use a temporary directory, instead of relying on a unique prefix.

Collectives™ on Stack Overflow

Very large array handling with astropy and numpy

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related