IndexError: size of data array does not conform to slice

1.6k views Asked by At

Why would I get this error only on subsequent execution of a python function?

I am running a python script that converts one kind of netCDF4 file to another and does this by calling a function in a module that I wrote.

The script processes several files in sequence. When I get to the second file in the list, I get an "IndexError: size of data array does not conform to slice" at "data['time'][:]" in this bit of code in my function:

varobj = cdf.createVariable('time','f8',('time'))
varobj.setncatts(dictifyatts(data['time'],''))
varobj[:] = data['time'][:]

It doesn't matter what the file is. The script always happily processed the first file and then chokes on the second, e.g. the second time it evokes the function it fails, the first time is OK.

Using a debugger I discovered there's no difference in varobj[:] and data['time'][:] from the first to the second invocations. As follows:

Second time the function is called, inspecting the variables reveals:

ipdb> data['time']
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    description: time of measurement
    calendar: gregorian
    units: seconds since 1970-01-01T00:00:00 UTC
path = /Data/Burst
unlimited dimensions: 
current shape = (357060,)
filling off


ipdb> varobj
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    description: time of measurement
    calendar: gregorian
    units: seconds since 1970-01-01T00:00:00 UTC
unlimited dimensions: 
current shape = (357056,)
filling on, default _FillValue of 9.969209968386869e+36 used

The first time the function is called, inspecting the variables reveals exactly the same result with the shapes the same size.

This same error is reported here: Error when creating variable to create a netCDF file

And based on this I tried the following code instead:

cf_time = data['time'][:]
cdf.createVariable('time','f8',('time'))
cdf['time'].setncatts(dictifyatts(data['time'],''))
cdf['time'][:] = cf_time[:]

Which did not work either. Same error under the same circumstances.

I'm out of ideas and could use suggestions on what to check for next.

Thanks Bart for spying the shape change. That was a big clue. I was checking file names.

When I investigated the shape change, I found that within my function, one of the input variables is holding information from the previous time the function was called.
First, why would only one of the input variables hold on to stale information?
Two, this should not happen at all, it should be out of scope.

I will try to reproduce this behavior in minimized code, int he meantime, answers to the question about scope in python would be appreciated -- I thought I understood how python handled scope.

Here is minimal code which will demonstrate the problem. Somehow the calling function can change a variable (good_ens) that is out of scope.

def doFile(infileName, outfileName, goodens, timetype, flen):

    print('infilename = %s' % infileName)
    print('outfilename = %s' % outfileName)
    print('goodens at input are from %d to %d' % (goodens[0],goodens[1]))
    print('timetype is %s' % timetype)

    maxens = flen # fake file length
    print('%s time variable has %d ensembles' % (infileName,maxens))

    # TODO - goodens[1] has the file size from the previous file run when multiple files are processed!
    if goodens[1] < 0:
        goodens[1] = maxens

    print('goodens adjusted for input file length are from %d to %d' % (goodens[0],goodens[1]))

    nens = goodens[1]-goodens[0]
    print('creating new netCDF file %s with %d records (should match input file)' % (outfileName, nens))



datapath = ""

datafiles = ['file0.nc',\
             'file1.nc',\
             'file2.nc',\
             'file3.nc']
# fake file lengths for this demonstration
datalengths = [357056, 357086, 357060, 199866]
outfileroot = 'outfile'
attFile = datapath + 'attfile.txt'
# this gets changed!  It should never be changed!
# ask for all ensembles in the file
good_ens = [0,-1]

 # --------------  beyond here the user should not need to change things
for filenum in range(len(datafiles)):

    print('\n--------------\n')
    print('Input Parameters before function call')
    print(good_ens)
    inputFile = datapath + datafiles[filenum]
    print(inputFile)
    l = datalengths[filenum]
    print(l)
    outputFile = datapath + ('%s%03d.cdf' % (outfileroot,filenum))
    print(outputFile)

    print('Converting from %s to %s' % (inputFile,outputFile))
    # the variable good_ens gets changed by this calling function, and should not be
    doFile(inputFile, outputFile, good_ens, 'CF', l)
    # this works, but will not work for me in using this function
    #doNortekRawFile(inputFile, outputFile, [0,-1], 'CF', l)
2

There are 2 answers

0
Marinna Martini On

So the problem here came from an old C programmer (me) misunderstanding how python passes objects to functions. I reduced the code, isolated the problem and posted the issue here: python variable contents changed by function when no change is intended Where it has been answered: python always passes pointers, unlike C which is explicit about whether a pointer or contents are being passed.

0
benschbob91 On

I came here because I had the same error when trying to drop a large xarray into a netcdf file. Turns out that I had to re-chunk the dataset into uniform chunks without residuals. Dask does this by "compute_chunk_sizes()", in xarray you can specify the chunks by "arr.chunk()". xarray documentation .chunk()