Convert arbitrary json to csv using template/schema language

564 views Asked by At

As part of an ETL pipeline, I'd like to convert a JSON string into tabular csv format. The format of the JSON object is fairly arbitrary (maybe only parts can be converted into a tabular csv format). A user should be able to change the way the conversion works based on the specific JSON object to convert (so there needs to be some sort of configuration (perhaps a template)).

For instance:

{
  "data": [{"a": 12, "b": 34, "c": 56}, {"a": 78, "b": 90, "c": 10}, ...}],
  "other": "stuff"
}

or

[
  {"a": 12, "b": 34, "c": 56}, {"a": 78, "b": 90, "c": 10}, ...}
]

or

{
  "some": {
     "really": {
        "nested": {
          "data": [{"a": 12, "b": 34, "c": 56}, {"a": 78, "b": 90, "c": 10}, ...}]
        }
     }
  }
}

is conceivable. In fact it's difficult to make assumption about the JSON structure can, except for that there's some data that can be transformed into a tabular format.

I'm looking for a template or schema "language" that allows me to convert this data from the JSON object to csv format. The conversion will run in Python. I saw handlebars being used for this kind of stuff in the past (and found pybars3) which would probably work (see handlebar template for 3rd example).

a,b,c\n
{{#each some.really.nested.data}}
{{this.a}},{{this.b}},{{this.c}}\n
{{/each}}

I could also use Mako or Jinja, but am afraid that they are too powerful and would allow code execution beyond the template creation (not sure if the concern is justified).

My question is if someone has experience with any of the template types I mention and could point to possible caveats, or whether there are other template languages for this kind of problem which I should consider.

Let me know if this question is too vague. I hope it doesn't get shut down as it's too open ended (I'm happy to edit/refine it).

1

There are 1 answers

1
Ian Wilson On

I don't see why not to use the python standard library tools for this. Ie. DictWriter.

  • Read Json into python using standard lib
  • tabulate manually with python
  • Write to csv with valid encoding and quoting using standard lib

I think mako and the like are mainly meant for working with proper or improper HTML and wouldn't be able to handle the proper quoting and encoding.