Grouping 2D numpy array in average

Question

I am trying to group a numpy array into smaller size by taking average of the elements. Such as take average foreach 5x5 sub-arrays in a 100x100 array to create a 20x20 size array. As I have a huge data need to manipulate, is that an efficient way to do that?

Similar to this answer as well.

Daniel
– Daniel

2017-12-17 03:54:07 +00:00
Commented Dec 17, 2017 at 3:54 — Daniel
– Daniel, Commented Dec 17, 2017 at 3:54

Jonas Adler · Accepted Answer · 2017-12-16 23:58:01Z

34

I have tried this for smaller array, so test it with yours:

import numpy as np

nbig = 100
nsmall = 20
big = np.arange(nbig * nbig).reshape([nbig, nbig]) # 100x100

small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1)

An example with 6x6 -> 3x3:

nbig = 6
nsmall = 3
big = np.arange(36).reshape([6,6])
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1)

array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5],
       [ 27.5,  29.5,  31.5]])

edited Dec 16, 2017 at 23:58

Jonas Adler

10.9k5 gold badges50 silver badges76 bronze badges

answered Jan 7, 2011 at 11:17

eumiro

214k36 gold badges307 silver badges264 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

divenex Over a year ago

Here is a generalization to N-dimensional arrays stackoverflow.com/a/73078468/3753826

mtrw · Accepted Answer · 2011-01-07 10:12:20Z

This is pretty straightforward, although I feel like it could be faster:

from __future__ import division
import numpy as np
Norig = 100
Ndown = 20
step = Norig//Ndown
assert step == Norig/Ndown # ensure Ndown is an integer factor of Norig
x = np.arange(Norig*Norig).reshape((Norig,Norig)) #for testing
y = np.empty((Ndown,Ndown)) # for testing
for yr,xr in enumerate(np.arange(0,Norig,step)):
    for yc,xc in enumerate(np.arange(0,Norig,step)):
        y[yr,yc] = np.mean(x[xr:xr+step,xc:xc+step])

You might also find scipy.signal.decimate interesting. It applies a more sophisticated low-pass filter than simple averaging before downsampling the data, although you'd have to decimate one axis, then the other.

David · Accepted Answer · 2011-05-13 17:18:23Z

3

Average a 2D array over subarrays of size NxN:

height, width = data.shape
data = average(split(average(split(data, width // N, axis=1), axis=-1), height // N, axis=1), axis=-1)

answered May 13, 2011 at 17:18

David

1061 silver badge1 bronze badge

1 Comment

prl900 Over a year ago

Nice one! Just a clarification that average and split are numpy funtions.

simon · Accepted Answer · 2024-03-25 09:42:01Z

1

Using reduce() from the einops package:

import numpy as np
from einops import reduce

m, n = 100, 20  # `nbig` and `nsmall` in the accepted answer
big = np.arange(m * m).reshape(m, m)
small = reduce(big.astype(float), "(a1 a2) (b1 b2) -> a1 b1", "mean", a1=n, b1=n)

edited Mar 25, 2024 at 9:42

answered Mar 22, 2024 at 13:10

simon

6,7642 gold badges18 silver badges33 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:02:44Z

Note that eumiro's approach does not work for masked arrays as .mean(3).mean(1) assumes that each mean along axis 3 was computed from the same number of values. If there are masked elements in your array, this assumption does not hold any more. In that case, you have to keep track of the number of values used to compute .mean(3) and replace .mean(1) by a weighted mean. The weights are the normalized number of values used to compute .mean(3).

Here is an example:

import numpy as np


def gridbox_mean_masked(data, Nbig, Nsmall):
    # Reshape data
    rshp = data.reshape([Nsmall, Nbig//Nsmall, Nsmall, Nbig//Nsmall])

    # Compute mean along axis 3 and remember the number of values each mean
    # was computed from
    mean3 = rshp.mean(3)
    count3 = rshp.count(3)

    # Compute weighted mean along axis 1
    mean1 = (count3*mean3).sum(1)/count3.sum(1)
    return mean1


# Define test data
big = np.ma.array([[1, 1, 2],
                   [1, 1, 1],
                   [1, 1, 1]])
big.mask = [[0, 0, 0],
            [0, 0, 1],
            [0, 0, 0]]
Nbig = 3
Nsmall = 1

# Compute gridbox mean
print gridbox_mean_masked(big, Nbig, Nsmall)

Collectives™ on Stack Overflow

Grouping 2D numpy array in average

5 Answers 5

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related