How to merge two or three 3D arrays in python?

5.6k views Asked by At

I have time series data in hdf format. I use the code below to read the data from the hdf files. Now I tried to join data on the basis of latitude and longitude for those data having same jdn (julian day number). Data with same julian day number represent the continuous spatial data

import glob
import numpy as np
import os
from pyhdf.SD import SD,SDC


files = glob.glob('MOD04*')
files.sort()
for f in files:
    product = f[0:5]+ '-Atmospheric Product'
    year = f[10:14]
    jdn = f[14:17] # julian day number

    # Read dataset.
    hdf = SD(f, SDC.READ)
    data3D = hdf.select('Deep_Blue_Aerosol_Optical_Depth_550_Land')
    data = data3D[:,:].astype(np.double)

    # Read geolocation dataset 
    lat = hdf.select('Latitude')
    latitude = lat[:,:]
    lon = hdf.select('Longitude')
    longitude = lon[:,:]

my data are attached in this link: https://drive.google.com/folderview?id=0B2rkXkOkG7ExX2lTTWEySU1fOWc&usp=sharing

2

There are 2 answers

5
Heather QC On BEST ANSWER

Numpy's hstack, vstack, or dstack (depending on the axis you'd like to join the arrays) will join multidimensional arrays.

Note that for MODIS aerosol data specifically, using hstack to join the arrays will occasionally throw an error because sometimes the arrays are 203 x 135 and sometimes 204 x 135 so the horizontal dimension won't always match

Building on your code (not pretty, but functional):

import glob
import numpy as np
import os
from pyhdf.SD import SD,SDC


files = glob.glob('MOD04*')
files.sort()
for n, f in enumerate(files):
    product = f[0:5]+ '-Atmospheric Product'
    year = f[10:14]
    jdn = f[14:17] # julian day number

    # Read dataset.
    hdf = SD(f, SDC.READ)
    data3D = hdf.select('Deep_Blue_Aerosol_Optical_Depth_550_Land')
    data = data3D[:,:].astype(np.double)

   # Read geolocation dataset 
    lat = hdf.select('Latitude')
    latitude = lat[:,:]
    lon = hdf.select('Longitude')
    longitude = lon[:,:]

    if n != 0 and jdn != old_jdn:
        #do analysis; write to file for later analysis; etc.
        pass

    if n == 0 or jdn != old_jdn:
        data_timeseries = data
        latitude_timeseries = latitude
        longitude_timeseries = longitude
    else:
        data_timeseries = np.vstack((data_timeseries, data))
        latitude_timeseries = np.vstack((latitude_timeseries, latitude))
        longitude_timeseries = np.vstack((longitude_timeseries, longitude))

    print data_timeseries.shape
    print latitude_timeseries.shape
    print longitude_timeseries.shape

    old_jdn = jdn
0
Gabriel123 On

just to follow up on Heather QC' answer, here is an illustration of the np.stack functions and which dimensions are concerned:

arr1 = np.array([[[1,2],[2,3]],
                 [[1,2],[2,3]],
                 [[1,2],[2,3]]])

arr2 = np.array([[[5,6],[8,7]],
                 [[7,6],[7,8]],
                 [[6,7],[7,8]]])

print("arr1 shape  ", arr1.shape)    
print("arr2 shape  ", arr2.shape)    
print("vstack shape", np.vstack((arr1, arr2)).shape)
print("hstack shape", np.hstack((arr1, arr2)).shape)
print("dstack shape", np.dstack((arr1, arr2)).shape)

>>> arr1 shape   (3, 2, 2)
>>> arr2 shape   (3, 2, 2)
>>> vstack shape (6, 2, 2)
>>> hstack shape (3, 4, 2)
>>> dstack shape (3, 2, 4)