converting an accumulated variable to timestep values in a netcdf file with CDO

2.9k views Asked by At

I have a netcdf-file with about 100 timesteps on a grid with one variable, which is accumulated over the timesteps. I am now interested in calculating the contribution of each timestep to the variable's value (i.e. the difference of consecutive timesteps).

Currently I use the following sequence:

  1. To extract every single timestep into a new file I use cdo seltimestep,$i ...,
  2. calculate each difference into a new file with cdo sub $i ${i-1} ...
  3. and merge those new files in the end with cdo mergetime ... into one single result file.

That seems to me to be very cumbersome and not ideal regarding to performance. Because of the amount of timesteps I cannot use a cdo pipeline and need to create many files in the meantime therefore.

Is there one better solution to convert an accumulated variable to timestep values with cdo (or something else like nco/ncl?)

3

There are 3 answers

4
N1B4 On BEST ANSWER

numpy's diff computes the difference of consecutive entries.

I suspect you have a multi-dimension variable in your file, so here is a generic example of how to do it:

import netCDF4
import numpy as np

ncfile = netCDF4.Dataset('./myfile.nc', 'r')
var = ncfile.variables['variable'][:,:,:] # [time x lat x lon]

# Differences with a step of 1 along the 'time' axis (0) 
var_diff = np.diff(var, n=1, axis=0) 
ncfile.close()

# Write out the new variable to a new file     
ntim, nlat, nlon = np.shape(var_diff)

ncfile_out = netCDF4.Dataset('./outfile.nc', 'w')
ncfile_out.createDimension('time', ntim)
ncfile_out.createDimension('lat', nlat)
ncfile_out.createDimension('lon', nlon)
var_out = ncfile_out.createVariable('variable', 'f4', ('time', 'lat', 'lon',))
var_out[:,:,:] = var_diff[:,:,:]
ncfile_out.close()
2
jhamman On

xarray is my tool of choice for this sort of thing:

import xarray as xr

# Open the netCDF file
ds = xr.open_dataset('./myfile.nc')

# Take the diff along the time dimension
ds['new_variable'] = ds['variable'].diff(dim='time')

# Write a new file
ds.to_netcdf('outfile.nc')
2
ClimateUnboxed On

If you want to use cdo, no need for all those loops and writing a lot of files, just use the function deltat :

cdo deltat in.nc diff.nc 

Like the python solution, this will be orders of magnitude faster than the loops you were using, and has the advantage of being a command-line one-liner.

Alternatively, and much less concise, you can difference the two series if you know the length (I show this as this technique can be useful in other contexts):

# calculate number of steps in the file:
nstep=$(cdo -s ntime in.nc)

# do difference between steps 2:n and steps 1:(n-1)
cdo sub -seltimestep,2/$nstep in.nc -seltimestep,1/`expr $nstep - 1` in.nc diff.nc

Postscript on accumulated fields! Note that the two above solutions and BOTH the python solutions posted on this page produce an output that has one timestep less than the input, i.e. THEY THROW AWAY THE FIRST TIMESTEP. In some cases, for example if you have a model flux field that is accumulated in the forecast, which seems to be the case, you don't want to discard the first timestep (as that is the accumulation from zero at the forecast start to the first step). In that case you can extract that first step and insert it on at the "front" of the file like this:

cdo mergetime -seltimestep,1 in.nc diff.nc diff_with_step1.nc 

You should also ensure you do this for the python solutions too.

you can pipe the whole thing as a oneliner (sometimes piping can lead to bus errors or seg faults, these can usually be remedied using the "-L" option to enforce sequential operations).

cdo mergetime -seltimestep,1 in.nc -deltat in.nc diff_with_step1.nc

try this if you get a seg fault

cdo -L mergetime -seltimestep,1 in.nc -deltat in.nc diff_with_step1.nc

and this to guard against rounding and accuracy issues, if you have packed data (i.e. type NC_SHORT):

cdo -L -b f32 mergetime -seltimestep,1 in.nc -deltat in.nc diff_with_step1.nc