Are there python functions to grab netcdf indices, counts by value, and etc

210 views Asked by At

I am not experienced in python, and I am converting an NCL script to python, in hopes python will run much faster. Searching around, I am not finding an answer to what I think are the simplest computations in the NCL script. Looking how the tougher computations are done, I am also not finding an answer to how these may be done in python.

The bulk of the computations are done after converting the 3-dimensional variables to 1-dimensional variables, and querying their values and positions in array space. Knowing the t variable positions in array space, we can get the p variable values that correspond to the t variable integer values.

The computations are as follows:

  • Set a value in the p variable to its default _FillValue,
  • count the number (volume) of grid points a value occurs for each possible integer value in the t variable (a sum in time and space),
  • compute the start and end time indices for each of the possible integer values in the t variable,
  • compute the duration time as the difference (+ 1 because numbers) between the end and start time in the t variable,
  • compute the average (space time) latitude and longitude for each of the possible integer values in the t variable,
  • compute the area (volume/duration) for each possible integer value in the t variable,
  • compute the average p from the p variable where it corresponds in space time to each possible integer value in the t variable, and
  • compute the p percentiles from the p variable where it corresponds in space time to each possible integer value in the t variable.

All of these computations save the values in 1-dimensional arrays, with dimension sizes equal to the maximum integer value in the t variable. For example, a t variable may have integers from 0 to 100. The 0 integer value is ignored, so each of the 1-dimensional arrays should have 100 values in the example; (100 volumes, 100 start times, 100 end times, etc.).

Finally, all of the 1-dimensional arrays are written to a (tab delimited) text file, with each column being the 1-dimensional arrays.

;===================================================================
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_code.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_csm.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"
;===================================================================
begin
;===============================================================
begTime = get_cpu_time()
; Data I/O and data names
; T output, and raw data input
f_t = addfile("t_in.nc","r")
f_p = addfile("p_in.nc","r")
; Data variables
time = f_t->time
lat = f_t->lat
lon = f_t->lon
t_var = f_t->t
p_var = f_p->p

p_fix = p_var
p_fix = where(p_var.eq.9.96921e+36, p_var@_FillValue, p_var)
delete(p_var)
p_var = p_fix
delete(p_fix)

; t = 0, is not measured

; Compute volume, start and end time indices, delta time, centroid lat and centroid lon, area, and percentiles
volume = new(max(t_var)+1, "integer")
start_time = new(max(t_var)+1, "integer")
end_time = new(max(t_var)+1, "integer")
delta_time = new(max(t_var)+1, "integer")
centroid_lat = new(max(t_var)+1, "double")
centroid_lon = new(max(t_var)+1, "double")
area = new(max(t_var)+1, "float", -9999.)
v_av = new(max(t_var)+1, "float", -9999.)
p_10 = new(max(t_var)+1, "float", -9999.)
p_25 = new(max(t_var)+1, "float", -9999.)
p_50 = new(max(t_var)+1, "float", -9999.)
p_75 = new(max(t_var)+1, "float", -9999.)
p_90 = new(max(t_var)+1, "float", -9999.)
t1D = ndtooned(t_var)
p1D = ndtooned(p_var)
dsizes_t = dimsizes(t_var)
do i=1,max(t_var)
  indices_t = ind_resolve(ind(t1D.eq.i),dsizes_t)
  volume(i) = num(t_var.eq.i)
  start_time(i) = indices_t(0,0)
  end_time(i) = indices_t(dimsizes(indices_t(:,0))-1,0)
  delta_time(i) = 1+end_time(i)-start_time(i)
  centroid_lat(i) = avg(lat(indices_t(:,1)))
  centroid_lon(i) = avg(lon(indices_t(:,2)))
  area(i) = volume(i)/delta_time(i)
  v_av(i) = avg(p1D(ind(t1D.eq.i)))
  p_10(i) = Percentile(p1D(ind(t1D.eq.i)),10)
  p_25(i) = Percentile(p1D(ind(t1D.eq.i)),25)
  p_50(i) = Percentile(p1D(ind(t1D.eq.i)),50)
  p_75(i) = Percentile(p1D(ind(t1D.eq.i)),75)
  p_90(i) = Percentile(p1D(ind(t1D.eq.i)),90)
  delete(indices_t)
end do

; Write data as table to text file
r = ispan(1,max(t_var),1)

system("/bin/rm -f var.txt")
fname = "var.txt"
fhead = systemfunc("echo -e tnum $'\t' start $'\t' end $'\t' dt $'\t' c_lat $'\t' c_lon $'\t' vol $'\t' area $'\t' v_avg $'\t' p_10 $'\t' p_25 $'\t' p_50 $'\t' p_75 $'\t' p_90 >> "+fname)
print(fhead)
do i=1,max(t_var)
  str_var = sprinti("%8.0i",r(i-1))+"$'\t'"+sprinti("%4.0i",start_time(i))+"$'\t'"+sprinti("%4.0i",end_time(i))+"$'\t'"+sprinti("%4.0i",delta_time(i))+"$'\t'"+\
            sprintf("%2.2f",centroid_lat(i))+"$'\t'"+sprintf("%3.2f",centroid_lon(i))+"$'\t'"+\
            sprinti("%10.0i",volume(i))+"$'\t'"+sprintf("%8.2f",area(i))+"$'\t'"+sprintf("%3.2f",v_av(i))+"$'\t'"+\
            sprintf("%3.2f",p_10(i))+"$'\t'"+sprintf("%3.2f",p_25(i))+"$'\t'"+\
            sprintf("%3.2f",p_50(i))+"$'\t'"+sprintf("%3.2f",p_75(i))+"$'\t'"+\
            sprintf("%3.2f",p_90(i))
  cmd = systemfunc("echo -e " + str_var + " >> "+fname)
  print(cmd)
end do

print("Total run time: " + (get_cpu_time() - begTime))
end
1

There are 1 answers

0
le_affan On

The variety of data that may be stored in a NetCDF file won't allow me to assure you that it will solve your problem, but the Python bindings of the GDAL library with its dedicated driver for NetCDF are certainly a good starting point if you want to explore files of this format via Python code.

Once you've been able to access your data using this library you'll most likely find help with your actual problem around here if you're ready to break it up into smaller pieces instead of throwing a whole NCL script out there to be converted into Python.