Parse multivalued JSON in GREL (OpenRefine)

1.8k views Asked by At

I have a column with the following content:

7. {"resource":"abc"}
8. [{"resource":"def"},{"resource":"ghi"}]

I try to get the content of "resource":

value.parseJson().resource

Works. If I try to get the content of multivalued cells, I can't get it to work. I tried according to https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions:

forEach(value.parseJson().resource,v,v.resource)

I get a lot of errors, for example:

7.  {"resource":"abc"}  Error: First argument to forEach is not an array
8.  [{"resource":"def"},{"resource":"ghi"}] Error: Object does not have any field, including resource

And I can only get the multivalued cells, but not the single value cells with this one:

forEach(value.parseJson(),v,v.resource)
1

There are 1 answers

1
Owen Stephens On BEST ANSWER

A problem you have here is the difference between the cells that contain JSON arrays (i.e. multiple values) and those that don't.

There are different ways of approaching this issue, and the best approach may depend on how consistent the data is overall.

My first suggestion would be to use a filter or facet to work with the single value and array cells separately. Given the data you use in your example I think the following would work:

1) Create a Custom Text Facet on the column using the GREL

value.startsWith("[")

2) Select 'false' from the facet to work with the single value cells 3) For these cells use the GREL

value.parseJson().resource

4) Select 'true' from the facet to work with the array cells 5) For these cells use the GREL

forEach(value.parseJson(),v,v.resource)

(n.b. this is slightly different to the GREL you mention in your question as giving errors)

6) The output of this GREL is an OpenRefine array. You'll need to convert this to a string to store the output in a cell - so you may need to use something like:

forEach(value.parseJson(),v,v.resource).join("|")