Converting time series data to gridded 3D array in python

570 views Asked by At

I have a dataframe with the timeseries data for many years and includes values of variable at different lat lon locations every day. For a given day, the variable is recorded at different locations. Following is a snippet of the dataframe which I am reading in python pandas:

               lat      lon         variable  
Date                                                            
2017-12-31  12.93025  59.9239     10.459373     
2019-12-31  12.53044  43.9229     12.730064     
2019-02-28  12.37841  33.9245     37.487683  

I want to:

  1. Grid it to 2x2.5 degrees resolution
  2. Make a 3D array which includes the gridded data as well its time variation. I want to get a gridded dataset as an array with the shape (time, lat, lon). This is because the dataframe that I grid at a certain resolution has to be compared with global meteorology data with a resolution of 2x2.5 degrees. (Also, my dataset does not record data from all locations on all days and will have to take care of the missing data while creating the final array).

I have looked into geopandas, xarray and histogram2d for gridding the data. I have also successfully gridded the data using histigram2d function. However, could only achieve a 2D array which lacks time information making my analysis a challenge. I know, ideally I should concatenate the time dimesion to my 2D array but struggling with how exactly to do so given that not all locations record data at all times.

This is how I used the histogram2d function for creating 1degree grid cells:

**

#Plot histogram2d - for gridding the data:
df=df_in['2019'] #taking one year at a time
# Test data, globally distributed
lat_r = df['lat']
lon_r = df['lon']
z_r = df['variable']
lat = np.array(lat_r)
lon = np.array(lon_r)
z = np.array(z_r)
    
# Create binning
binlon = np.linspace(-180,180, 361)
binlat = np.linspace(-90, 90, 181)
zz, xx, yy = np.histogram2d(lon, lat, bins=(binlon, binlat), weights=z, normed=False)
counts, _, _= np.histogram2d(lon, lat, bins=(binlon, binlat))\

# Workaround for zero count values tto not get an error.
# Where counts == 0, zi = 0, else zi = zz/counts
zi = np.zeros_like(zz)
zi[counts.astype(bool)] = zz[counts.astype(bool)]/counts[counts.astype(bool)]
zi = np.ma.masked_equal(zi, 0)

#Final, gridded data:
hist = zi.T # shape(180,360)

**

Any help in this regard will be much appreciated.

1

There are 1 answers

0
lsawade On

I ended up making sample data and worked on both the 2D and the 3D case. I'll start with the 2D case that you already have working because the extension to the 3D case is then very simple.

2D

First, let's create some random sample data. Note that I import everything I need for later here

import numpy as np
import matplotlib.pyplot as plt
import cartopy
from cartopy.crs import PlateCarree
from matplotlib.colors import Normalize

def create2Ddata():
    '''Makes some random data'''

    N = 2000
    lat = 10 * np.random.rand(N) + 40
    lon = 25 * np.random.rand(N) - 80
    z = np.sin(4*np.pi*lat/180.0*np.pi) + np.cos(8*np.pi*lon/180.0*np.pi)

    return lat, lon, z

# Create Data
lat, lon, z = create2Ddata()

This will serve as some random, scattered, geospatial, data that we want to plot using the histogram function. The next step is then to both create bins that make sense followed by actually binning.


def make2dhist(lon, lat, z, latbins, lonbins):
    '''Takes the inputs and creates 2D histogram'''
    zz, _, _ = np.histogram2d(lon, lat, bins=(
        lonbins, latbins), weights=z, normed=False)
    counts, _, _ = np.histogram2d(lon, lat, bins=(lonbins, latbins))\

    # Workaround for zero count values to not divide by zero.
    # Where counts == 0, zi = 0, else zi = zz/counts
    zi = np.zeros_like(zz)
    zi[counts.astype(bool)] = zz[counts.astype(bool)] / \
        counts[counts.astype(bool)]
    zi = np.ma.masked_equal(zi, 0)

    return lonbins, latbins, zi

# Make bins
latbins = np.linspace(np.min(lat), np.max(lat), 75)
lonbins = np.linspace(np.min(lon), np.max(lon), 75)

# Bin the data
_, _, zi = make2dhist(lon, lat, z, latbins, lonbins)

Then, we plot both the scattered data and the binned data as follows.


def plotmap():
    '''background map plotting'''

    ax = plt.gca()
    ax.add_feature(cartopy.feature.LAND, zorder=0, edgecolor='None',
                   linewidth=0.5, facecolor=(0.8, 0.8, 0.8))
    ax.spines['geo'].set_linewidth(0.75)


fig = plt.figure()

# Just plot the scattered data
ax = plt.subplot(211, projection=PlateCarree())
plotmap()
plt.scatter(lon, lat, s=7, c=z, cmap='rainbow')

# Plot the binned 2D data
ax = plt.subplot(212, projection=PlateCarree())
plotmap()
plt.pcolormesh(
    lonbins, latbins, zi.T, shading='auto', transform=PlateCarree(),
    cmap='rainbow')
plt.show()

Figure 2D, not allowed to embed figures yet...

At the top, the scattered data, at the bottom the binned data.


3D

Let's continue with the 3D case. Again, let's create some random scattered data that varies in time:


def create3Ddata():
    ''' Make random 3D data '''
    N = 8000
    lat = 10 * np.random.rand(N) + 40
    lon = 25 * np.random.rand(N) - 80
    t = 10 * np.random.rand(N)

    # Linearly changes sign of the cos+sin wavefield
    z = (t/5 - 1) * (np.sin(2*2*np.pi*lat/180.0*np.pi)
                     + np.cos(4*2*np.pi*lon/180.0*np.pi))

    return lat, lon, t, z


# Create Data
lat, lon, t, z = create3Ddata()

Now, instead of using histogram2d here, we will use histogramdd, which is just the N-dimensional version of the same function.


def make3dhist(lon, lat, t, z, latbins, lonbins, tbins):
    '''Takes the inputs and creates 3D histogram just as the 2D histogram
    function'''
    zz, _ = np.histogramdd(
        np.vstack((lon, lat, t)).T,
        bins=(lonbins, latbins, tbins),
        weights=z, normed=False)

    counts, _ = np.histogramdd(
        np.vstack((lon, lat, t)).T,
        bins=(lonbins, latbins, tbins))
    # Workaround for zero count values tto not get an error.
    # Where counts == 0, zi = 0, else zi = zz/counts
    zi = np.zeros_like(zz)
    zi[counts.astype(bool)] = zz[counts.astype(bool)] / \
        counts[counts.astype(bool)]
    zi = np.ma.masked_equal(zi, 0)
    return lonbins, latbins, tbins, zi

# Create bins
latbins = np.linspace(np.min(lat), np.max(lat), 75)
lonbins = np.linspace(np.min(lon), np.max(lon), 75)
tbins = np.linspace(np.min(t), np.max(t), 5)

# Bin the data
_, _, _, zi = make3dhist(lon, lat, t, z, latbins, lonbins, tbins)

Finally, we plot both the scattered data and the binned data side by side in respective time bins. Note the normalization that is used to make sure variations in time are easily observed. Note that there are three loops (I could have put them in a single one, but this is nicer for readability).

  1. The first loop bins the data in time and plots the binned data in one slice each.
  2. The second loop bins the data in time, then in space using the 2D histogram function from earlier, and plots a slice for each time bin.
  3. The third function uses the already 3D binned data from above and plots the slices as well by accessing slices in the 3D matrix.

# Normalize the colors so that variations in time are easily seen
norm = Normalize(vmin=-1.0, vmax=1.0)

fig = plt.figure(figsize=(12, 10))

# The scattered data in time bins
# Left column
for i in range(4):
    ax = plt.subplot(4, 3, 3*i + 1, projection=PlateCarree())
    plotmap()

    # Find points in time bins
    pos = np.where((tbins[i] < t) & (t < tbins[i+1]))

    # Plot scatter points
    plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')
    plt.scatter(lon[pos], lat[pos], c=z[pos], s=7, cmap='rainbow', norm=norm)
    plt.colorbar(orientation='horizontal', pad=0.0)

# Center column
for i in range(4):
    ax = plt.subplot(4, 3, 3*i + 2, projection=PlateCarree())
    plotmap()
    plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')

    # Find data points in time bins
    pos = np.where((tbins[i] < t) & (t <= tbins[i+1]))

    # Bin the data for each time bin separately
    _, _, zt = make2dhist(lon[pos], lat[pos], z[pos], latbins, lonbins)
    plt.pcolormesh(
        lonbins, latbins, zt.T, shading='auto', transform=PlateCarree(),
        cmap='rainbow', norm=norm)
    plt.colorbar(orientation='horizontal', pad=0.0)

# Right column
for i in range(4):
    ax = plt.subplot(4, 3, 3*i + 3, projection=PlateCarree())
    plotmap()
    plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')
    plt.pcolormesh(
        lonbins, latbins, zi[:, :, i].T, shading='auto', transform=PlateCarree(),
        cmap='rainbow', norm=norm)
    plt.colorbar(orientation='horizontal', pad=0.0)

plt.show()

Figure 3D, not allowed to embed figures yet...

In the left column, the scattered, random, geospatial data, where the titles indicate the bins. In the center column, the 2D histograms using "by hand" time-binned data. In the right column, the slices that were binned using a 3D histogram. As expected center and right columns show the exact same thing.

Hope this solves your problem.