This is the first time I work with the spatial data so I really need your help.
I tried to download data ERA5-land monthly from this using python (with the format netcdf.zip). I chose the subregion that can cover France. After that I tried to read the data in R by the following code:
library(ncdf4)
library(raster)
library(sf)
library(ggplot2)
ncfile <- nc_open(filepath)
print(ncfile)
and I get the result like that: (I list variable t2m for example, the others have the same discription).
33 variables (excluding dimension variables):
short t2m[longitude,latitude,expver,time]
scale_factor: 0.000759505975281538
add_offset: 277.460679817325
_FillValue: -32767
missing_value: -32767
units: K
long_name: 2 metre temperature
.....
4 dimensions:
longitude Size:151
units: degrees_east
long_name: longitude
latitude Size:100
units: degrees_north
long_name: latitude
expver Size:2
long_name: expver
time Size:886
units: hours since 1900-01-01 00:00:00.0
long_name: time
calendar: gregorian
2 global attributes:
Conventions: CF-1.6
history: 2023-11-21 15:03:18 GMT by grib_to_netcdf-2.24.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/tmp/7a615d9b-759a-4ab0-a842-c978c79c8850-adaptor.mars.internal-1700578884.4900465-17448-6-tmp.nc /cache/tmp/7a615d9b-759a-4ab0-a842-c978c79c8850-adaptor.mars.internal-1700573425.5042555-17448-4-tmp.grib
(1) I don't know why the number of missing value equals to the number of fill value? I checked it and it turns out if I filtered expver = 2 than I get all NA, indeed, in the expver 1 there also exists NAs. when I tried to plot the data using this code:
t2m_var <- ncvar_get(ncfile, "t2m")
t2m_v1 <- t2m_var[,,1,880] #longitude, latitude, expver, time
t2m_v2 <- t2m_var[,,2,880]
t2m_slice <- t2m_v1
index <- is.na(t2m_slice)
t2m_slice[index] <- t2m_v2[index]
t2m_raster <- raster(t2m_slice, xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +datum=WGS84"))
(2) I got a weird picture where there are many empty space that corresponding to NA (it shouldn't be like that? because ERA5-data cover all the territory?) Temperature of air at 2m above the surface I'm thinking that did I miss something when working with this data.
(3) Can you help me to figure out which step was wrong and the way to fix it? Thank you so much !
You have two separate problems and you have to solve them individually because they are not related or compounding.
First of all, you have a mix of ERA5 and ERA5T data, with the latter type being the "experimental" data from the most recent period. Data is initially released as experimental. Only after additional quality control is data released as the final product. That process takes about 3 months so all data that is younger than 3 months is ERA5T. When you have a data set with a time series that straddles this 3-month boundary you get an additional dimension,
expver
, with dimension values 1 (final product, the older data) and 5 (the experimental recent data). That arrangement is explained in more detail here. If you want to blend this data, which is perfectly fine for many applications, you can do so with some effort. The 3D arrays with the twoexpver
values are complementary: where one has values, the otherNA
values. There is no specific information in the file that indicates how many valid time steps there are in eachexpver
array but you can simply read the data, then drop the slices in the "time" dimension that are allNA
values andabind::abind()
the remaining data.Secondly, your picture is not at all weird and there are no
NA
values in funny places. You have ERA5land data so there is data on all land areas andNA
s on oceans. Where things go wrong is how you map the data. Your image is obviously mirrored, with Bretagne pointing upwards, the Alps over the Pyrenees and vice-versa. The "culprit" here is that R organises its data in column-major order (from the top-left corner values progress downwards and then by column to the right) (reading chapter 5 in the R Manual would give you all the details), while typical mapping software uses a row-major order (from the top-left corner work across, then down). If you passraster::raster()
an NetCDF file it will interpret this correctly but if you give it an array it has no way of knowing if the data has been corrected for this difference in storage. You can easily correct this by permutating the 3D array (so after merging the twoexpver
slices) withaperm(., c(2, 1, 3))
.