Json to new-line delimited json

2.5k views Asked by At

I'm trying to convert Json file to ndjson. I'm reading the file from GCS(google cloud Storage). sample data:

{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}

following is my code.

bucket = client.get_bucket('bucket name')
# Name of the object to be stored in the bucket
object_name_in_gcs_bucket = bucket.get_blob('file.json')
object_to_string = object_name_in_gcs_bucket.download_as_string()
#json_data = ndjson.loads(object_to_string)
json_list = [json.loads(row.decode('utf-8')) for row in object_to_string.split(b'\n') if row]

The error I'm receiving is at json_list: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

required output:

{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
1

There are 1 answers

0
OneLiner On

I think your main problem is that you are splitting on line endings instead of the closing brace. Here is an example that accomplishes what I think you are trying.

from json import loads, dumps

with open("test.json") as f:
  file_string = f.read()
  dicts = [loads(f"{x}}}".replace("\n","")) for x in file_string.split("}")[0:-1]]
  for d in dicts:
    print(d)

with open("new.json", "a+") as newf:
  for d in dicts:
    newf.write(f"{dumps(d)}\n")

Output:

[root@foohome]# ./test.py
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
[root@foo home]# cat new.json
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}