I'm trying to load a very large jsonl file (>50 GB) using chunks in pandas
reader = pd.read_json("January.jsonl", lines = True, chunksize = 10000)
for chunk in reader:
df = chunk
This code starts, runs for a while an then returns this error
self._parse_no_numpy()
File "C:\Users\anaconda3\lib\site-packages\pandas\io\json\_json.py", line 1089, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
You seem to have malformed JSON data in your file. For example, try loading the following "JSON" data - note that id 77 is malformed.
Then run this code.
And view the output:
The error is the same as the one you received. You will need to find the malformed data and fix it. You could try reading the JSON data line by line to find out where the error(s) exists and extract the lines to inspect them.