Merge thousand of IMERG 30-min rainfall netcdf files into single netcdf

704 views Asked by At

I have 8736 nc4 files (30-minute rainfall from 1 Jun - 31 Dec 2000) downloaded from https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHH_06/summary?keywords=IMERG with naming convention

3B-HHR.MS.MRG.3IMERG.20000601-S000000-E002959.0000.V06B.HDF5.nc4

3B-HHR.MS.MRG.3IMERG.20000601-S003000-E005959.0030.V06B.HDF5.nc4

Start Date/Time: All files in GPM will be named using the start date/time of the temporal period of the data contained in the product. The field has two subfields separated by a hyphen.

Start Date: YYYYMMDD Start Time: Begin with Capital S and follow with HHMMSS End Time: Begin with Capital E and follow with HHMMSS Hours are presented in a 24-hour time format, with ‘00’ indicating midnight. All times in GPM will be in Coordinated Universal Time (UTC).

The half-hour sequence starts at 0000, and increments by 30 for each half hour of the day.

I would like to merge all the files into single nc4. The reason is, I would like to do further processing ie. calculate rolling sum to get 6 or 12hour rainfall accumulation, and other analysis.

I followed suggestion from other similar topic by using: cdo mergetime file*.nc4 output.nc4 and ncecat file*.nc4 output.nc4 But both are failed with error argument list too long

As suggested from below answer to split the files into separate lists (by months), I did using following script: for i in $(seq -f "%02g" 1 12); do mkdir -p "Month$i"; mv 3B-HHR.MS.MRG.3IMERG.????$i*.nc4 "Month$i"; done

And increase the limit, now ulimit -s on my mac give answer 65536

Then I tried again using ncecat file*.nc4 output.nc4 in a folder with 1440 files and its worked.

But I just realized that the result has record dimension UNLIMITED and time = 1.

ncdump

When I open the output.nc4 using Panoply, Record = 1440 and Time only have 1 information: Date 1 Jun 2000

Panoply

This is something new for me as new user, I am expecting I will have similar output like I did when using Daily or Monthly data, the time dimension will have UNLIMITED value.

Any suggestion how to solve above problem? Is there any step that I should do?

3

There are 3 answers

2
Charlie Zender On

Sounds like a shell limitation (possibly Windows?) to me. ncecat keeps at most 3 files open at one time. The NCO Users Guide describes multiple workarounds for handling arbitrarily long lists of input files. At least one of these methods will work for you. HINT: Try the -n option combined with symbolic links as shown in the manual.

Edit in response to comment, 2020-10-22: Here is how the manual demonstrates creating nicely named symbolic links to a million files:

# Create enumerated symbolic links
/bin/ls | grep \.nc | perl -e \
'$idx=1;while(<STDIN>){chop;symlink $_,sprintf("%06d.nc",$idx++);}'
ncecat -n 999999,6,1 000001.nc foo.nc
# Remove symbolic links when finished
/bin/rm ??????.nc

You can shorten the number of arguments piped to /bin/ls by constraining the list with a pattern, so the shell stops complaining, then repeat until all your files have a link. Then you execute the single ncecat command shown in the example, with one filename, and you are done.

Edit in response to newest question, 20201101:

It seems like you used ncecat when what you really need is ncrcat. Their difference is a bit subtle. Now that you solved the shell limit, the easiest way to solve the issue is just to re-do the command with ncrcat instead of ncecat:

ncrcat file*.nc4 output.nc4
2
Robert Wilson On

This is almost certainly an OS specific problem. If you are on Linux, you can only have 1024 files open at once, by default. I do not know about macOS.

You could change the limit (e.g. see here), but that is probably not a good idea.

So the best thing would be to split the files into 9 separate lists, create 9 files with those merged, and then merge those files.

1
ClimateUnboxed On

I think it is a stack limit on the argument size passed to the command, you can see this by typing

ulimit -s 

and you will probably get an answer of 8192.

You can try increasing this, e.g.

ulimit -s 32768

and see if that resolves the problem. On my MAC I could not go above this new value; attempting to set this soft limit to 65536, gave me a "ulimit: value exceeds hard limit" error.