How to manipulate data ERA5-land in R

205 views Asked by At

This is the first time I work with the spatial data so I really need your help.

I tried to download data ERA5-land monthly from this using python (with the format netcdf.zip). I chose the subregion that can cover France. After that I tried to read the data in R by the following code:

library(ncdf4)
library(raster)
library(sf)
library(ggplot2)

ncfile <- nc_open(filepath)
print(ncfile)

and I get the result like that: (I list variable t2m for example, the others have the same discription).

33 variables (excluding dimension variables):
        
        short t2m[longitude,latitude,expver,time]   
            scale_factor: 0.000759505975281538
            add_offset: 277.460679817325
            _FillValue: -32767
            missing_value: -32767
            units: K
            long_name: 2 metre temperature
        ..... 

     4 dimensions:
        longitude  Size:151 
            units: degrees_east
            long_name: longitude
        latitude  Size:100 
            units: degrees_north
            long_name: latitude
        expver  Size:2 
            long_name: expver
        time  Size:886 
            units: hours since 1900-01-01 00:00:00.0
            long_name: time
            calendar: gregorian

    2 global attributes:
        Conventions: CF-1.6
        history: 2023-11-21 15:03:18 GMT by grib_to_netcdf-2.24.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/tmp/7a615d9b-759a-4ab0-a842-c978c79c8850-adaptor.mars.internal-1700578884.4900465-17448-6-tmp.nc /cache/tmp/7a615d9b-759a-4ab0-a842-c978c79c8850-adaptor.mars.internal-1700573425.5042555-17448-4-tmp.grib

(1) I don't know why the number of missing value equals to the number of fill value? I checked it and it turns out if I filtered expver = 2 than I get all NA, indeed, in the expver 1 there also exists NAs. when I tried to plot the data using this code:

t2m_var <- ncvar_get(ncfile, "t2m")
t2m_v1 <- t2m_var[,,1,880]  #longitude, latitude, expver, time
t2m_v2 <- t2m_var[,,2,880]
t2m_slice <- t2m_v1
index <- is.na(t2m_slice)
t2m_slice[index] <- t2m_v2[index]
t2m_raster <- raster(t2m_slice, xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +datum=WGS84"))

(2) I got a weird picture where there are many empty space that corresponding to NA (it shouldn't be like that? because ERA5-data cover all the territory?) Temperature of air at 2m above the surface I'm thinking that did I miss something when working with this data.

(3) Can you help me to figure out which step was wrong and the way to fix it? Thank you so much !

1

There are 1 answers

1
Patrick On

You have two separate problems and you have to solve them individually because they are not related or compounding.

First of all, you have a mix of ERA5 and ERA5T data, with the latter type being the "experimental" data from the most recent period. Data is initially released as experimental. Only after additional quality control is data released as the final product. That process takes about 3 months so all data that is younger than 3 months is ERA5T. When you have a data set with a time series that straddles this 3-month boundary you get an additional dimension, expver, with dimension values 1 (final product, the older data) and 5 (the experimental recent data). That arrangement is explained in more detail here. If you want to blend this data, which is perfectly fine for many applications, you can do so with some effort. The 3D arrays with the two expver values are complementary: where one has values, the other NA values. There is no specific information in the file that indicates how many valid time steps there are in each expver array but you can simply read the data, then drop the slices in the "time" dimension that are all NA values and abind::abind() the remaining data.

Secondly, your picture is not at all weird and there are no NA values in funny places. You have ERA5land data so there is data on all land areas and NAs on oceans. Where things go wrong is how you map the data. Your image is obviously mirrored, with Bretagne pointing upwards, the Alps over the Pyrenees and vice-versa. The "culprit" here is that R organises its data in column-major order (from the top-left corner values progress downwards and then by column to the right) (reading chapter 5 in the R Manual would give you all the details), while typical mapping software uses a row-major order (from the top-left corner work across, then down). If you pass raster::raster() an NetCDF file it will interpret this correctly but if you give it an array it has no way of knowing if the data has been corrected for this difference in storage. You can easily correct this by permutating the 3D array (so after merging the two expver slices) with aperm(., c(2, 1, 3)).