How to select value from CSV containing JSON as a column

141 views Asked by At

I have a weird CSV with one column being JSON data of quite a large size and one is a name. Showing proper JSON formatting for clarity, but the actual file is flat with no newline chars.

name, 
{
"field1":
   {"name":"foo",
   "email":"[email protected]"},
"field2":{
     "subfield":{
          "subfield2":{
               "active":1,
               "passive":11,
               "running":111
                       }
                  }
           }
}

The input without pretty formatting…

name,{"field1":{"name":"foo","email":"[email protected]"},"field2":{"subfield":{"subfield2":{"active":1,"passive":11,"running":111}}}}

I am trying to get the name and all unique values for the subfield 2 into a output (ideally CSV).

name, 
active passive running

The issue is that the number of fields is not constant and the names of the subfields are not constant either. I have tried using jq, Miller, and sed/awk without much luck since it combines a huge JSON blob and CSV data.

3

There are 3 answers

4
aborruso On BEST ANSWER

If you run in your sample input

<input.txt sed '/name,/d' | jq -cr '.field2.subfield.subfield2 | keys[]' | paste -s -d ' ' | mlr --csv --implicit-csv-header then label name

you get

name
active passive running

The sample output you write, is wrong, because if it's a one field CSV, you do not have the , after name, there are no other fields

name, 
active passive running

But I probably didn't understand what you want

9
peak On

You could use your favorite CSV-to-TSV translator to convert the CSV to TSV and then run it through jq like so:

jq -rR '
  split("\t") 
  | [[.0], 
     ( .[1]|fromjson|.field2.subfield.subfield2 | keys_unsorted|join(" "))] 
  | @csv'
1
null_nerd On

I spent some time with sed and it got me what I needed using the below:

sed -e 's/.*subfield2\(.*\)}}}.*/\1/' input.txt