13

I'd like to read the content of multiple files, process their data individually (because of performance and hardware resources) and write my results into one 'big' netCDF4 file.

Right now I'm able to read the files, process their data, but I struggle with the resulting multiple arrays. I wasn't able to merge them correctly.

I've got a 3d array (time,long,lat) containing my calculated value for each day. What I like to do is to merge all the arrays I've got into one big array before I write it into my netCDF4 file. (all days in one array)

Here two example arrays:

  • day1[19790101][-25][35]=95
  • day2[19790102][-15][25]=93

My expected result is:

  • allDays[19790101][-25][35]=95
  • allDays[19790102][-15][25]=93

How can I achive that structure?

  • When I use: allDays=day1+day2 my data will be aggregated.
  • When I use:

    allDays=[]
    allDays.append(day1)
    allDays.append(day2)
    

    my data will be surrounded by a new array.

FYI: I'm using Ubuntu 14.04 and Python: 3.5 (Anaconda)

8
  • By arrays do you mean lists? Commented Mar 13, 2016 at 18:08
  • I work with numpy and print(type(day1))=<class 'numpy.ndarray'> I'm new to python (coming from java) Commented Mar 13, 2016 at 18:11
  • I thought you could be talking about numpy arrays, just wasn't sure Commented Mar 13, 2016 at 18:13
  • Try allDays.append(day1.tolist()) Commented Mar 13, 2016 at 18:14
  • @Reti43 no day2 won't have [19790101] but it could have the same lon & lat Commented Mar 13, 2016 at 18:26

4 Answers 4

18

Now you can do something like this with python 3:

tst1 = [1, 2, 3]
tst2 = [4, 5, 6]

ts3 = [*tst1, *tst2]

with results: [1, 2, 3, 4, 5, 6]

Sign up to request clarification or add additional context in comments.

Comments

10

When you do

allDays=[]
allDays.append(day1)
allDays.append(day2)

You are making a list of pointers to existing data, rather than repackaging the data. You could do:

allDays=[]
allDays.append(day1[:])
allDays.append(day2[:])

And now it will copy the data out of day1 and into the new allDays array. This will double your memory usage, so perhaps best to issue a del day1 after each addition to allDays.

Having said all that, if you use Pandas (usually recommended for time series data) or Numpy, this whole thing would be a lot quicker and use a lot less memory. Numpy arrays cannot hold pointers like python lists can, so the copy there is implied. Hope that clears some things up for you :) I can also highly recommend this video by Ned

Comments

3

Use allDays = np.concatenate((day1, day2)).

Comments

2

Let's start with some random data.

>>> import numpy as np
>>> day1 = np.random.randint(255, size=(1, 81, 141))

Your array has a dimension of size 1, so every time you want to access an element, you'll have to painstalkingly type day1[0,x,y]. You can remove that necessary dimension with np.squeeze().

>>> day1[0,50,50]
36
>>> day1 = np.squeeze(day1)
>>> day1.shape
(81, 141)
>>> day1[50,50]
36

Now let's make some more of these.

>>> day2 = np.random.randint(255, size=day1.shape)
>>> day3 = np.random.randint(255, size=day1.shape)

You can put all of these in one big list and pass them to np.array() which will create an array of size (N, 81, 141), where N is the number of days you have.

>>> allDays = np.array([day1, day2, day3])
>>> allDays.shape
(3, 81, 141)

All the data from day1 are in index 0, from day2 in index 1, etc.

>>> allDays[0,50,50]
36

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.