Python: ijson.parse(in_file) vs json.load(in_file)

Question

Python: ijson.parse(in_file) vs json.load(in_file)

3.8k views Asked by Course Attendance At 11 September 2024 at 03:35

I am trying to read a large JSON file (~ 2GB) in python.

The following code works well on small files but doesn't work on large files because of MemoryError on the second line.

in_file = open(sys.argv[1], 'r')
posts = json.load(in_file)

I looked at similar posts and almost everyone suggested to use ijson so I decided to give it a try.

in_file = open(sys.argv[1], 'r')
posts = list(ijson.parse(in_file))

This handled reading the big file size but ijson.parse didn't return a JSON object like json.load does so the rest of my code didn't work

TypeError: tuple indices must be integers or slices, not str

If I print out "posts" when using json.load, the o/p looks like a normal JSON

[{"Id": "23400089", "PostTypeId": "2", "ParentId": "23113726", "CreationDate": ... etc

If I print out "posts" after using ijson.parse, the o/p looks like a hash map

[["", "start_array", null], ["item", "start_map", null], 
 ["item", "map_key", "Id"], ["item.Id", "string ... etc

My question: I don't want to change the rest of my code so I am wondering if there is anyway to convert the o/p of ijson.parse(in_file) back to a JSON object so that it's exactly the same as if we are using json.load(in_file)?

Original Q&A

There are 1 answers

**flashback** · Answer 1 · 2019-09-04 08:59:08

Maybe this works for you:

in_file = open(sys.argv[1], 'r')
posts = []
data = ijson.items(in_file, 'item')
for post in data:
    posts.append(post)

TechQA.

Python: ijson.parse(in_file) vs json.load(in_file)

There are 1 answers

Related Questions in PYTHON

Related Questions in JSON

Related Questions in PARSING

Related Questions in LARGE-FILES

Related Questions in IJSON

Popular Questions

Popular Tags

Trending Questions