Json Files parsing

30 views Asked by At

So I am trying to open some json files to look for a publication year and sort them accordingly. But before doing this, I decided to experiment on a single file. I am having trouble though, because although I can get the files and the strings, when I try to print one word, it starts printinf the characters.

For example:

print data2[1] #prints

THE BRIDES ORNAMENTS, Viz. Fiue MEDITATIONS, Morall and Diuine. #results

but now print data2[1][0] #should print THE

T #prints T

This is my code right now:

json_data =open(path)
data = json.load(json_data)
i=0

data2 = []

for x in range(0,len(data)):
    data2.append(data[x]['section'])
    if len(data[x]['content']) > 0:
        for i in range(0,len(data[x]['content'])):
            data2.append(data[x]['content'][i])
1

There are 1 answers

0
yvyas On

I probably need to look at your json file to be absolutely sure, but it seems to me that the data2 list is a list of strings. Thus, data2[1] is a string. When you do data2[1][0], the expected result is what you are getting - the character at the 0th index in the string.

>>> data2[1]
'THE BRIDES ORNAMENTS, Viz. Fiue MEDITATIONS, Morall and Diuine.'
>>> data2[1][0]
'T'

To get the first word, naively, you can split the string by spaces

>>> data2[1].split()
['THE', 'BRIDES', 'ORNAMENTS,', 'Viz.', 'Fiue', 'MEDITATIONS,', 'Morall', 'and', 'Diuine.']
>>> data2[1].split()[0]
'THE'

However, this will cause issues with punctuation, so you probably need to tokenize the text. This link should help - http://www.nltk.org/_modules/nltk/tokenize.html