1

I am writing a Python program that imports 1000s of data point in blocks of 10 points at a time. From each block of 10 data points a maximum for that set is found, then the program loops to the next 10 data points and continues. All of this works fine, I just need to build an array to hold my maximum data point that are created once per loop, so I can plot it them later. How can I create this array within the loop, here is what I have:

for count in range(self.files/self.block_length):
    RSS = scipy.fromfile(self.hfile2, dtype=self.datatype, count=self.block_length)
    MaxRSS = np.max(RSS)#Takes the greatest value in the array of size defined by block_length

Here MaxRSS works great to save to file or print to screen, as the program loops; however, at the end of the loop it only holds the last value and I need something to hold all of the Max values found

1
  • This could be done with a list comprehension, but really you should be using a numpy array as @unutbu says. [scipy.fromfile(self.hfile2, dtype=self.datatype, count=self.block_length).max() for count in range(self.files/self.block_length)] Commented Apr 7, 2014 at 0:28

2 Answers 2

2

Instead of looping over 10 points at a time, if you have enough memory to read the entire dataset into an array, then you could reshape the array to a 2D array with 10 values per row, and the take the max along the rows:

In [59]: x = np.arange(50)

In [60]: x.reshape(-1, 10)
Out[60]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [61]: x.reshape(-1, 10).max(axis=1)
Out[61]: array([ 9, 19, 29, 39, 49])
Sign up to request clarification or add additional context in comments.

2 Comments

200k rep o_0 what's the secret?
Opening the file using numpy.memmap would eliminate the requirement that the file be fully read into memory...
1

Not sure if this will answer what you want... Assuming your for loop breaks the 1000s of points correctly into chunks of 10 (which I don't see in the example), to create an array within an array, you need to make MaxRSS a list and then append things to it:

MaxRSS = []
for count in range(self.files/self.block_length):
    RSS = scipy.fromfile(self.hfile2, dtype=self.datatype, count=self.block_length)
    MaxRSS.append(np.max(RSS))

EDIT:

This is not Numpy, but maybe will help:

import random

data = []
for _ in range(100):
    data.append(random.randint(1, 100))
# Ok, a is populated with 100 integers. 

# Grab chunks of 10 "points"
chunks=[data[x:x+10] for x in xrange(0, len(data), 10)]

# Initialization for the example done. Now, to your max list:
maxes = []
for chunk in chunks:
    maxes.append(max(chunk))
    print "The max number in chunk %s was: %s" % (chunk, maxes[-1])
print maxes #prints out the 10 max values of the 10 arrays of 10 numbers

4 Comments

sounds close, but I get this error: AttributeError: 'numpy.float32' object has no attribute 'append'
@eltel2910, that's because MaxRSS is still not a list (but a numpy.float32). Did you initialize it to [] (or to a Numpy array, maybe?)
I added RSS_result = [] and changed to RSS_result.append(MaxRSS). This gets rid of the error, but when I do print RSS_result after the loop I only get the last value
@eltel2910 did you add the initialization RSS_result = [] outside your for loop? Because that's how it should be (otherwise, you're gonna be initializing it on every iteration)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.