I am working with a nc file that has been created by merging single monthly nc files.
The actual file ('precip_combined.nc') contains global monthly precipitation data from 1979-2020 (size: 3.79 GB):
<xarray.Dataset>
Dimensions: (lon: 3600, lat: 1800, time: 504)
Coordinates:
* lon (lon) float32 -179.9 -179.8 -179.8 ... 179.8 179.9 179.9
* lat (lat) float32 89.95 89.85 89.75 ... -89.75 -89.85 -89.95
* time (time) datetime64[ns] 1979-01-01 1979-02-01 ... 2020-12-01
Data variables:
precipitation (time, lat, lon) float32 ...
Attributes:
history: Created on 2022-10-09 13:23
input_data_hash: 1ff7f07166074240030172bae541afd3ecbc27ef941e50a65b04a26...type here
The goal is:
- to calculate for each given coordinate (lat, lon) the annual sum of precipitation, based on the hydrological ("water") year which is commonly defined from October of year n until September of year n+1.
- store the results in a nc file with the new variable precipitation_sums for each year and for each coordinate.
I used this code:
import xarray as xr
# Open the netCDF file
data = xr.open_dataset('.../precip_combined.nc')
# Define the start and end months of the hydrological year
start_month = 10
end_month = 9
# Initialize an empty list to store the sums of precipitation
precipitation_sums = []
# Loop through each year from 1979 to 2020
for year in range(1979, 2021):
# Define the start and end dates of the hydrological year
start_date = f'{year}-{start_month:02d}-01'
end_date = f'{year+1}-{end_month:02d}-30'
# Select the data for the hydrological year
hydro_year_data = data.sel(time=slice(start_date, end_date))
# Calculate the sum of precipitation for each lon and lat coordinate
precipitation_sum = hydro_year_data['precipitation'].sum(dim=['lon', 'lat'])
# Append the sum to the list
precipitation_sums.append(precipitation_sum)
# Convert the list of sums to a new xarray dataset
precipitation_sums_dataset = xr.concat(precipitation_sums, dim='time')
# Add the precipitation sums dataset to the original dataset
data['precipitation_sums'] = precipitation_sums_dataset
# Save the updated dataset as a new netCDF file
data.to_netcdf('.../precip_annual_sums.nc')
and got as an output a nc file with following shapes:
<xarray.Dataset>
Dimensions: (lon: 3600, lat: 1800, time: 504)
Coordinates:
* lon (lon) float32 -179.9 -179.8 -179.8 ... 179.8 179.9 179.9
* lat (lat) float32 89.95 89.85 89.75 ... -89.75 -89.85 -89.95
* time (time) datetime64[ns] 1979-01-01 ... 2020-12-01
Data variables:
precipitation (time, lat, lon) float32 ...
precipitation_sums (time) float32 ...
Attributes:
history: Created on 2022-10-09 13:23
input_data_hash: 1ff7f07166074240030172bae541afd3ecbc27ef941e50a65b04a26...type here
As you see, the Data variable precipitation_sums does not contain any coordinates. For example, if I read the created nc file and extract data for London, I get no output.
Note that I need the global dataset, not for a specific location as asked here https://stackoverflow.com/questions/65072907/calculate-monthly-average-and-annual-sum-of-precipitation-from-netcdf-with-cdo
I would appreciate any help. There is no comprehensible example available on GitHub etc., probably the task is too easy for pros.
As you tagged the question with
cdoI presume acdoanswer is also acceptable for you, in this case you can shift the time index back 9 months, so October is now January, sum over the year, and then shift back so the timestamp is correct:Note:
--timestat_datewith optionslastandfirstto choose where the timestamp is defined (default is mid, so the data will April)