Extracting time series data from a netCDF file into a .csv with python

Question

I hope to find help here. I'm trying to extract a time serie for a temperature variable in a netcdf file to write the dataframe into a .csv. At the end of the code, I have the following error : IndexError: index 18224 is out of bounds for axis 1 with size 1, can you help me to resolve this problem ? Below, my code, comments and some prints() for you to understand better my problem. Thank you !

import netCDF4
from netCDF4 import Dataset
import numpy as np
import pandas as pd


data = Dataset(r'/gpfs/home/UDCPP/barrier_c/Test_NCO/temp_Corse_10m_201704.nc', 'r')

lat = data.variables['latitude'][:]
>>> print(lat)
[[41.123375  41.123375  41.123375  ... 41.123375  41.123375  41.123375 ]
 [41.1341975 41.1341975 41.1341975 ... 41.1341975 41.1341975 41.1341975]
 [41.14502   41.14502   41.14502   ... 41.14502   41.14502   41.14502  ]
 ...
 [43.26623   43.26623   43.26623   ... 43.26623   43.26623   43.26623  ]
 [43.2770525 43.2770525 43.2770525 ... 43.2770525 43.2770525 43.2770525]
 [43.287875  43.287875  43.287875  ... 43.287875  43.287875  43.287875 ]]

lon = data.variables['longitude'][:]
>>> print(lon)
[[ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 ...
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]
 [ 8.218151   8.2326964  8.2472418 ... 10.1526892 10.1672346 10.18178  ]]

##Calvi is just an example
lat_calvi =  42.57
lon_calvi =  8.75

##Squared difference of lat and lon
sq_diff_lat = (lat - lat_calvi)**2
sq_diff_lon = (lon - lon_calvi)**2

##Identifying the index of the minimum value for lat and lon
min_index_lat = sq_diff_lat.argmin()
min_index_lon = sq_diff_lon.argmin()

temp = data.variables['TEMP'][:]
>>> print(temp)
[[[[14.295403480529785 14.60593032836914 15.037308692932129 ...
    13.44691276550293 13.448591232299805 13.447751998901367]
   [14.130069732666016 14.316385269165039 14.63278579711914 ...
    13.44691276550293 13.448591232299805 13.447751998901367]
   [14.061250686645508 14.13510513305664 14.323938369750977 ...
    13.44691276550293 13.448591232299805 13.447751998901367]
   ...
##create an empty table
##starting date index 7 + the date
starting_date = data.variables['time'].units[7] + '2017-04-01'
>>> starting_date
' 2017-04-01'

#ending date index 7 + the date
ending_date = data.variables['time'].units[7] + '2017-04-30'
>>> ending_date
' 2017-04-30'

date_range = pd.date_range(start = starting_date, end = ending_date)
>>> date_range
DatetimeIndex(['2017-04-01', '2017-04-02', '2017-04-03', '2017-04-04',
               '2017-04-05', '2017-04-06', '2017-04-07', '2017-04-08',
               '2017-04-09', '2017-04-10', '2017-04-11', '2017-04-12',
               '2017-04-13', '2017-04-14', '2017-04-15', '2017-04-16',
               '2017-04-17', '2017-04-18', '2017-04-19', '2017-04-20',
               '2017-04-21', '2017-04-22', '2017-04-23', '2017-04-24',
               '2017-04-25', '2017-04-26', '2017-04-27', '2017-04-28',
               '2017-04-29', '2017-04-30'],
              dtype='datetime64[ns]', freq='D')


df = pd.DataFrame(0, columns = ['temp'], index = date_range)
>>> df
                   temp
2017-04-01            0
2017-04-02            0
2017-04-03            0
2017-04-04            0
2017-04-05            0
2017-04-06            0
2017-04-07            0
2017-04-08            0
2017-04-09            0
2017-04-10            0
2017-04-11            0
2017-04-12            0
2017-04-13            0
2017-04-14            0
2017-04-15            0
2017-04-16            0
2017-04-17            0
2017-04-18            0
2017-04-19            0
2017-04-20            0
2017-04-21            0
2017-04-22            0
2017-04-23            0
2017-04-24            0
2017-04-25            0
2017-04-26            0
2017-04-27            0
2017-04-28            0
2017-04-29            0
2017-04-30            0



dt = np.arange(0, data.variables['time'].size)

for time_index in dt:
    df.iloc[time_index] = temp[time_index,min_index_lat ,min_index_lon]
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/numpy/ma/core.py", line 3188, in __getitem__
    dout = self.data[indx]
IndexError: index 18224 is out of bounds for axis 1 with size 1


##save time serie into a .csv
df.to_csv('temp_test.csv')

Edit:

Below, the entire output of df.to_csv:

>>> df.to_csv('temp_test.csv')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 3204, in to_csv
    formatter.save()
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 188, in save
    compression=dict(self.compression_args, method=self.compression),
  File "/gpfs/apps/miniconda3/lib/python3.7/site-packages/pandas/io/common.py", line 455, in get_handle
    f = open(path_or_buf, mode, encoding=encoding, newline="")
PermissionError: [Errno 13] Permission denied: 'temp_test.csv'

If that's not the entire Traceback, please post it all.

Trenton McKinney
– Trenton McKinney

2020-06-17 15:54:08 +00:00
Commented Jun 17, 2020 at 15:54 — Trenton McKinney
– Trenton McKinney, Commented Jun 17, 2020 at 15:54

Robert Wilson · Accepted Answer · 2020-06-18 07:11:42Z

1

This can be done using xarray:

import xarray as xr
import pandas as pd


data = xr.open_dataset(r'/gpfs/home/UDCPP/barrier_c/Test_NCO/temp_Corse_10m_201704.nc', 'r')
df = data.to_dataframe().reset_index()
df.to_csv('temp_test.csv')

and check the contents of df, as this might include bnds that are in the netcdf.

answered Jun 18, 2020 at 7:11

Robert Wilson

3,44714 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Céline Barrier Over a year ago

Thank you @Robert Wilson, indeed I also tried with xarray, it works excepting for the last function, df.to_csv('temp_test.csv'). I got the following error : PermissionError : [Errno 13] Permission denied : 'temp_test.csv'. I don't know where is the problem (yet). Thank you for your help !

Robert Wilson Over a year ago

That probably means temp_test.csv is open by another application, or your Python script

Céline Barrier Over a year ago

Thank you for your fast reply. Indeed I saw that it could be the solution, but in my case, the file is not opened by an application or my script. Here is the entire output of df.to_csv :

Robert Wilson Over a year ago

This seems to be missing

Céline Barrier Over a year ago

I had to put it as a contribution of an answer because it was too much characters.

|

Collectives™ on Stack Overflow

Extracting time series data from a netCDF file into a .csv with python

1 Answer 1

13 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

13 Comments

Your Answer

Sign up or log in

Post as a guest

Related