EOF Error During Dict Slice

118 views Asked by At

I am trying to compile monthly data in to an existing JSON file that I loaded via import json. Initially, my json data just had one property which is 'name':

json_data['features'][1]['properties']
>>{'name':'John'}

But the end result with the monthly data I want is like this:

json_data['features'][1]['properties']

>>{'name':'John',
'2016-01': {'x1':0, 'x2':0, 'x3':1, 'x4':0},
'2016-02': {'x1':1, 'x2':0, 'x3':1, 'x4':0}, ... }

My monthly data are on separate tsv files. They have this format:

John    0    0    1    0
Jane    1    1    1    0

so I loaded them via import csv and parsed through a list of urls and set about placing them in a collective dictionary like so:

file_strings = ['2016-01.tsv', '2016-02.tsv', ... ]
collective_dict = {}
for i in strings:
    with open(i) as f:
        tsv_object = csv.reader(f, delimiter='\t')
        collective_dict[i[:-4]] = rows[0]:rows[1:5] for rows in tsv_object

I checked how things turned out by slicing collective_dict like so:

collective_dict['2016-01']['John'][0]
>>'0'

Which is correct; it just needs to be cast into an integer.

For my next feat, I attempted to assign all of the monthly data to the respective json members as part of their external properties:

for i in file_strings:
    for j in range(len(json_data['features'])):
        json_data['features'][j]['properties'][i[:-4]] = {}
        json_data['features'][j]['properties'][i[:-4]]['x1'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][0])
        json_data['features'][j]['properties'][i[:-4]]['x2'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][1])
        json_data['features'][j]['properties'][i[:-4]]['x3'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][2])
        json_data['features'][j]['properties'][i[:-4]]['x4'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][3])

Here I got an arrow pointing at the last few characters:

Syntax Error: unexpected EOF while parsing

It is a pretty complicated slice, I suppose user error is not to be ruled out. However, I did double and triple check things. I also looked up this error. It seems to come up with input() related calls. I'm left a bit confused, I don't see how I made a mistake (although I'm already mentally prepared to accept that).

My only guess was that something somewhere was not a string. When I checked collective_dict and json_data, everything that was supposed to be a string was a string ('John', 'Jane' et all). So, I guess it's something else.

I made the problem as simple as I could while keeping the original structure of the data and for loops and so forth. I'm using Python 3.6.

Question

Why am I getting the EOF error? How can I build my external properties data without encountering such an error?

1

There are 1 answers

2
Thomas Fauskanger On BEST ANSWER

Here I have rewritten your last code block to:

for i in file_strings:
    file_name = i[:-4]
    for j in range(len(json_data['features'])):
        name = json_data['features'][j]['properties']['name']
        file_dict = json_data['features'][j]['properties'][file_name] = {}
        for x in range(4):
            x_string = 'x{}'.format(x+1)
            file_dict[x_string] = int(collective_dict[file_name][name][x])

from:

for i in file_strings:
    for j in range(len(json_data['features'])):
        json_data['features'][j]['properties'][i[:-4]] = {}
        json_data['features'][j]['properties'][i[:-4]]['x1'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][0])
        json_data['features'][j]['properties'][i[:-4]]['x2'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][1])
        json_data['features'][j]['properties'][i[:-4]]['x3'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][2])
        json_data['features'][j]['properties'][i[:-4]]['x4'] = int(collective_dict[i[:-4]][json_data['features'][j]['properties']['name']][3])

That is just to make it a bit more readable, but that shouldn't change anything.

A thing I noticed in your other part of code is the following:

collective_dict[i[:-4]] = rows[0]:rows[1:5] for rows in tsv_object

The thing I refer to is the = rows[0]:rows[1:5] for rows in tsv_object part. In my IDE, that does not work, and I'm not sure if that is a typo in your question or of that is actually in your code, but I imagine you want it to actually be

collective_dict[i[:-4]] = {rows[0]:rows[1:5] for rows in tsv_object}

or something like that. I'm not sure if that could confuse the parser think that there is an error at the end of the file.

The ValueError: Invalid literal for int()

If your tsv-data is

John    0    0    1    0
Jane    1    1    1    0

Then it should be no problem to do int() of the string value. E.g.: int('42') will become an int with value 42. However, if you have an error in one, or several, lines of your files, then use something like this block of code to figure out which file and line it is:

file_strings = ['2016-01.tsv', '2016-02.tsv', ... ]
collective_dict = {}
for file_name in file_strings:
    print('Reading {}'.format(file_name))
    with open(file_name) as f:
        tsv_object = csv.reader(f, delimiter='\t')
        for line_no, (name, *x_values) in enumerate(tsv_object):
            if len(x_values) != 4:
                print('On line {}, there is only {} values!'.format(line_no, len(x_values)))
            try:
                intx = [int(x) for x in x_values]
            except ValueError as e:
                # Catch "Invalid literal for int()"
                print('Line {}: {}'.format(line_no, e))