Parse json file downloaded from Azure data lake

498 views Asked by At

I download a file from azure data lake which is in the following format:

{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}

I just want to read and parse this in python.

def read_ods_file():

    file_path = 'temp.json'
    data = []
    with open(file_path) as f:
        for line in f:
            data.append(json.loads(line))

This gave me the exception:

          data.append(json.loads(line))
        File "C:\python3.6\lib\json\__init__.py", line 354, in loads
          return _default_decoder.decode(s)
        File "C:\python3.6\lib\json\decoder.py", line 339, in decode
          obj, end = self.raw_decode(s, idx=_w(s, 0).end())
        File "C:\python3.6\lib\json\decoder.py", line 357, in raw_decode
          raise JSONDecodeError("Expecting value", s, err.value) from None
      json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Printing the lines show these added characters at the start. What are these added characters?

{"PartitionKey":"2020-10-05","value":"Resolved"...}

{"PartitionKey":"2020-10-06","value":"Resolved"...}
2

There are 2 answers

1
Pieter Svenson On

Microsoft uses all kinds of weird characters. You could try to use string.printable to only get normal ASCII characters like this:

How can I remove non-ASCII characters but leave periods and spaces using Python?

0
Sparrow1029 On

the f variable you set with

with open(file_path) as f:

is a python file object (of type _io.TextIOWrapper). If you want to read each line as a json object, you should try something like:

with open(file_path) as f:
    # read the file contents into a string
    # strip off trailing whitespace
    # split string into list of strings on \n character
    for line in f.read().strip().splitlines():
        data.append(json.loads(line))