I have a dataframe with the timeseries data for many years and includes values of variable at different lat lon locations every day. For a given day, the variable is recorded at different locations. Following is a snippet of the dataframe which I am reading in python pandas:
lat lon variable
Date
2017-12-31 12.93025 59.9239 10.459373
2019-12-31 12.53044 43.9229 12.730064
2019-02-28 12.37841 33.9245 37.487683
I want to:
- Grid it to 2x2.5 degrees resolution
- Make a 3D array which includes the gridded data as well its time variation. I want to get a gridded dataset as an array with the shape (time, lat, lon). This is because the dataframe that I grid at a certain resolution has to be compared with global meteorology data with a resolution of 2x2.5 degrees. (Also, my dataset does not record data from all locations on all days and will have to take care of the missing data while creating the final array).
I have looked into geopandas, xarray and histogram2d for gridding the data. I have also successfully gridded the data using histigram2d function. However, could only achieve a 2D array which lacks time information making my analysis a challenge. I know, ideally I should concatenate the time dimesion to my 2D array but struggling with how exactly to do so given that not all locations record data at all times.
This is how I used the histogram2d function for creating 1degree grid cells:
**
#Plot histogram2d - for gridding the data:
df=df_in['2019'] #taking one year at a time
# Test data, globally distributed
lat_r = df['lat']
lon_r = df['lon']
z_r = df['variable']
lat = np.array(lat_r)
lon = np.array(lon_r)
z = np.array(z_r)
# Create binning
binlon = np.linspace(-180,180, 361)
binlat = np.linspace(-90, 90, 181)
zz, xx, yy = np.histogram2d(lon, lat, bins=(binlon, binlat), weights=z, normed=False)
counts, _, _= np.histogram2d(lon, lat, bins=(binlon, binlat))\
# Workaround for zero count values tto not get an error.
# Where counts == 0, zi = 0, else zi = zz/counts
zi = np.zeros_like(zz)
zi[counts.astype(bool)] = zz[counts.astype(bool)]/counts[counts.astype(bool)]
zi = np.ma.masked_equal(zi, 0)
#Final, gridded data:
hist = zi.T # shape(180,360)
**
Any help in this regard will be much appreciated.
I ended up making sample data and worked on both the 2D and the 3D case. I'll start with the 2D case that you already have working because the extension to the 3D case is then very simple.
2D
First, let's create some random sample data. Note that I import everything I need for later here
This will serve as some random, scattered, geospatial, data that we want to plot using the histogram function. The next step is then to both create bins that make sense followed by actually binning.
Then, we plot both the scattered data and the binned data as follows.
Figure 2D, not allowed to embed figures yet...
At the top, the scattered data, at the bottom the binned data.
3D
Let's continue with the 3D case. Again, let's create some random scattered data that varies in time:
Now, instead of using
histogram2d
here, we will usehistogramdd
, which is just the N-dimensional version of the same function.Finally, we plot both the scattered data and the binned data side by side in respective time bins. Note the normalization that is used to make sure variations in time are easily observed. Note that there are three loops (I could have put them in a single one, but this is nicer for readability).
Figure 3D, not allowed to embed figures yet...
In the left column, the scattered, random, geospatial data, where the titles indicate the bins. In the center column, the 2D histograms using "by hand" time-binned data. In the right column, the slices that were binned using a 3D histogram. As expected center and right columns show the exact same thing.
Hope this solves your problem.