0

I hope to find help here. I'm trying to extract a time serie for a temperature variable in a netcdf file to write the dataframe into a .csv. At the end of the code, I have the following error : IndexError: index 18224 is out of bounds for axis 1 with size 1, can you help me to resolve this problem ? Below, my code, comments and some prints() for you to understand better my problem. Thank you !

import netCDF4
from netCDF4 import Dataset
import numpy as np
import pandas as pd


data = Dataset(r'/gpfs/home/UDCPP/barrier_c/Test_NCO/temp_Corse_10m_201704.nc', 'r')

lat = data.variables['latitude'][:]
>>> print(lat)
[[41.123375  41.123375  41.123375  ... 41.123375  41.123375  41.123375 ]
 [41.1341975 41.1341975 41.1341975 ... 41.1341975 41.1341975 41.1341975]
 [41.14502   41.14502   41.14502   ... 41.14502   41.14502   41.14502  ]
 ...
 [43.26623   43.26623   43.26623   ... 43.26623   43.26623   43.26623  ]
 [43.2770525 43.2770525 43.2770525 ... 43.2770525 43.2770525 43.2770525]
 [43.287875  43.287875  43.287875  ... 43.287875  43.287875  43.287875 ]]

lon = data.variables['longitude'][:]
>>> print(lon)
[[ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 ...
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]]

##Calvi is just an example
lat_calvi =  42.57
lon_calvi =  8.75

##Squared difference of lat and lon
sq_diff_lat = (lat - lat_calvi)**2
sq_diff_lon = (lon - lon_calvi)**2

##Identifying the index of the minimum value for lat and lon
min_index_lat = sq_diff_lat.argmin()
min_index_lon = sq_diff_lon.argmin()

temp = data.variables['TEMP'][:]
>>> print(temp)
[[[[14.295403480529785 14.60593032836914 15.037308692932129 ...
    13.44691276550293 13.448591232299805 13.447751998901367]
   [14.130069732666016 14.316385269165039 14.63278579711914 ...
    13.44691276550293 13.448591232299805 13.447751998901367]
   [14.061250686645508 14.13510513305664 14.323938369750977 ...
    13.44691276550293 13.448591232299805 13.447751998901367]
   ...
##create an empty table
##starting date index 7 + the date
starting_date = data.variables['time'].units[7] + '2017-04-01'
>>> starting_date
' 2017-04-01'

#ending date index 7 + the date
ending_date = data.variables['time'].units[7] + '2017-04-30'
>>> ending_date
' 2017-04-30'

date_range = pd.date_range(start = starting_date, end = ending_date)
>>> date_range
DatetimeIndex(['2017-04-01', '2017-04-02', '2017-04-03', '2017-04-04',
               '2017-04-05', '2017-04-06', '2017-04-07', '2017-04-08',
               '2017-04-09', '2017-04-10', '2017-04-11', '2017-04-12',
               '2017-04-13', '2017-04-14', '2017-04-15', '2017-04-16',
               '2017-04-17', '2017-04-18', '2017-04-19', '2017-04-20',
               '2017-04-21', '2017-04-22', '2017-04-23', '2017-04-24',
               '2017-04-25', '2017-04-26', '2017-04-27', '2017-04-28',
               '2017-04-29', '2017-04-30'],
              dtype='datetime64[ns]', freq='D')


df = pd.DataFrame(0, columns = ['temp'], index = date_range)
>>> df
                   temp
2017-04-01            0
2017-04-02            0
2017-04-03            0
2017-04-04            0
2017-04-05            0
2017-04-06            0
2017-04-07            0
2017-04-08            0
2017-04-09            0
2017-04-10            0
2017-04-11            0
2017-04-12            0
2017-04-13            0
2017-04-14            0
2017-04-15            0
2017-04-16            0
2017-04-17            0
2017-04-18            0
2017-04-19            0
2017-04-20            0
2017-04-21            0
2017-04-22            0
2017-04-23            0
2017-04-24            0
2017-04-25            0
2017-04-26            0
2017-04-27            0
2017-04-28            0
2017-04-29            0
2017-04-30            0



dt = np.arange(0, data.variables['time'].size)

for time_index in dt:
    df.iloc[time_index] = temp[time_index,min_index_lat ,min_index_lon]
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/numpy/ma/core.py", line 3188, in __getitem__
    dout = self.data[indx]
IndexError: index 18224 is out of bounds for axis 1 with size 1


##save time serie into a .csv
df.to_csv('temp_test.csv')



Edit:

Below, the entire output of df.to_csv:

>>> df.to_csv('temp_test.csv')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 3204, in to_csv
    formatter.save()
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 188, in save
    compression=dict(self.compression_args, method=self.compression),
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/pandas/io/common.py", line 455, in get_handle
    f = open(path_or_buf, mode, encoding=encoding, newline="")
PermissionError: [Errno 13] Permission denied: 'temp_test.csv'
1
  • If that's not the entire Traceback, please post it all. Commented Jun 17, 2020 at 15:54

1 Answer 1

1

This can be done using xarray:

import xarray as xr
import pandas as pd


data = xr.open_dataset(r'/gpfs/home/UDCPP/barrier_c/Test_NCO/temp_Corse_10m_201704.nc', 'r')
df = data.to_dataframe().reset_index()
df.to_csv('temp_test.csv')

and check the contents of df, as this might include bnds that are in the netcdf.

Sign up to request clarification or add additional context in comments.

13 Comments

Thank you @Robert Wilson, indeed I also tried with xarray, it works excepting for the last function, df.to_csv('temp_test.csv'). I got the following error : PermissionError : [Errno 13] Permission denied : 'temp_test.csv'. I don't know where is the problem (yet). Thank you for your help !
That probably means temp_test.csv is open by another application, or your Python script
Thank you for your fast reply. Indeed I saw that it could be the solution, but in my case, the file is not opened by an application or my script. Here is the entire output of df.to_csv :
This seems to be missing
I had to put it as a contribution of an answer because it was too much characters.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.