Best way to combine lists netcdf into one dataframe in R - Nested for loops or mapply?

811 views Asked by At

I am trying to combine multiple netcdf files with multiple variables:

- 6 types of parameters
-36 years
- 12 months
-31 days
- 6 Y coordinates
- 5 X coordinates

Each netcdf file contains data for 1 month of a year and 1 parameter, there are thus 432 * 6 =2592 files.

How would I best combine this all in a dataframe? It would in the end have to generate something like this:

rowID   Date        year  month day coord.X coord.Y par1 par2  par3  par4 par5  par6       
1       1979-01-01  1979  01    01  176     428     3.2  0.005 233.5 0.1  12.2  4.4
..................... 402568 rows in between.................
402570  2014-12-31  2014  12    31  180     433     1.7  0.006 235.7 0.2  0.0   2.7

How would I best combine this? I already have been struggling with this quite some time...

Excuse me for not knowing how to be able to make this question reproducible.. but there are so many elements involved. This where I have my files from: ftp://rfdata:[email protected]/WFDEI/

This is what I have so far, I think this what they call a nested loop right?: I ussually just try and try and try and in the end it works.. but I find this a tough job. Any recommendations on first steps are welcome, please.

require(ncdf4)
directory<-c("C:/folder/")                              # general folder
parameter<-c("par1","par2","par3","par4","par5","par6") # names of 6 parameters
directory2<-c("_folder2/")                              # parameter specific folder
directory3<-c("name")                                   # last part of folder name
years<-c("1979","otheryears","2014")                    # years which are also part of netcdf file name
months<-c("01","othermonths","12")                      # months which are also part of netcdf file name
x=c(176:180)                                            # X-coordinates
y=c(428:433)                                            # Y-coordinates



 require(plyr)

 for (p in parameter){
assign(paste0(p,"list"), list())
  for (i in years){
   for (j in months){
    for (k in x){
      for (l in y){
fileloc<-paste(directory,p,directory2,p,directory3,i,j,".nc",sep="") #location to open
    ncin<-nc_open(fileloc)
assign(paste0(p))<-ncvar_get(ncin,p)                         # extract the desired parameter from the netcdf list "ncin" and store in vector with name of parameter
day<-ncvar_get(ncin,"day")                                   # extract the day of month from the netcdf list "ncin"
par.coord<-paste(p,"[",y,",",x,",","]",sep="")               #string with function to select coordinates
temp<-data.frame(i,j,day,p=par.coord)                        # store day and parameter in dataframe
temp<-cbind(date=as.Date(with(temp,paste(i,j,day,sep="-")),"%Y-%m-%d"),temp,Y=y,X=x)                                               # Add date and coordinates to df
assign(paste0(p,"list"), list(temp)                          #store multiple data frames in a list.. I think?
    }assign(paste0(p,"list"), do.call(rbind,data)            # something to bind the dataframes by row in a list
}}}}
2

There are 2 answers

0
T. BruceLee On BEST ANSWER

In order to extract these Netcdf files by extract and group all the Netcdf files in one dataframe by:

-6 parameters
-36 years
-12 months
-31 days
-6 Y coordinates
-5 X coordinates

First I made sure all *.nc files were in one folder. Second I simplified multiple for loops into one, since variables as year, month and parameter were available from the file name:

The variables day, Xcoord and Y coord could be extracted as one in an array.

require(arrayhelpers);require(stringr);require(plyr);require(ncdf4)
# store all files from ftp://rfdata:[email protected]/WFDEI/ in the following folder:
setwd("C:/folder")
temp = list.files(pattern="*.nc")           #list all the file names
param<-gsub("_\\S+","",temp,perl=T)         #extract parameter from file name

xcoord=seq(176,180,by=1)                    #The X-coordinates you are interested in
ycoord=seq(428,433,by=1)                    #The Y-coordinates you are interested in

list_var<-list()                         # make an empty list
for (t in 1:length(temp)){
temp_year<-str_sub(temp[],-9,-6)                                                                                #take string number last place minus 9 till last place minus 6 to extract the year from file name
temp_month<-str_sub(temp[],-5,-4)                                                                               #take string number last place minus 9 till last place minus 6 to extract the month from file name
temp_netcdf<-nc_open(temp[t]) 
temp_day<-rep(seq(1:length(ncvar_get(temp_netcdf,"day"))),length(xcoord)*length(ycoord))                   # make a string of day numbers the same length as amount of values
dim.order<-sapply(temp_netcdf[["var"]][[param[t]]][["dim"]],function(x) x$name)                            # gives the name of each level of the array
start <- c(lon = 428, lat = 176, tstep = 1)                                                                     # indicates the starting value of each variable
count <- c(lon = 6, lat = 5, tstep = length(ncvar_get(nc_open(temp[t]),"day")))                                 # indicates how many values of each variable have to be present starting from start
tempstore<-ncvar_get(temp_netcdf, param[t], start = start[dim.order], count = count[dim.order])            # array with parameter values

df_temp<-array2df (tempstore, levels = list(lon=ycoord, lat = xcoord, day = NA), label.x = "value")           # convert array to dataframe
Add_date<-sort(as.Date(paste(temp_year[t],"-",temp_month[t],"-",temp_day,sep=""),"%Y-%m-%d"),decreasing=FALSE)  # make vector with the dates
list_var[t]<-list(data.frame(Add_date,df_temp,parameter=param[t]))                                         #add dates to data frame and store in a list of all output files
  nc_close(temp_netcdf)                                                                                           #close nc file to prevent data loss and prevent error when working with a lot of files
}
All_NetCDF_var_in1df<-do.call(rbind,list_var)  

#### If you want to take a look at the netcdf files first use:
list2env(
  lapply(setNames(temp, make.names(gsub("*.nc$", "", temp))), 
         nc_open), envir = .GlobalEnv) #import all parameters lists to global environment
1
russellpierce On

There are many ways to skin a cat like that. Nested loops are perhaps a bit easier to debug if you're new with R. One question I think you want to ask yourself is whether the files have primacy or your conceptual structure has primacy. That is, if your conceptual structure specifies a location for which there isn't a file, what do you want your code to do? If you only want to try to parse extant files, I find it useful to use a list.files(, full.names = TRUE, recursive = TRUE) to find the files I want to parse and then write a function to parse a single file (and its name) to produce the data structure I want. From there, it is an lapply or purrr::map.