How can I clean up empty fields when converting CSV to JSON with Miller?

674 views Asked by At

I have several CSV files of item data for a game I'm messing around with that I need to convert to JSON for consumption. The data can be quite irregular with several empty fields per record, which makes for sort of ugly JSON output.

Example with dummy values:

Id,Name,Value,Type,Properties/1,Properties/2,Properties/3,Properties/4
01:Foo:13,Foo,13,ACME,CanExplode,IsRocket,,
02:Bar:42,Bar,42,,IsRocket,,,
03:Baz:37,Baz,37,BlackMesa,CanExplode,IsAlive,IsHungry,

Converted output:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket", ""]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Type": "",
  "Properties": ["IsRocket", "", ""]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

So far I've been quite successful with using Miller. I've managed to remove completely empty columns from the CSV as well as aggregate the Properties/X columns into a single array.

But now I'd like to do two more things to improve the output format to make consuming the JSON easier:

  • remove empty strings "" from the Properties array
  • replace the other empty strings "" (e.g. Type of the second record) with null

Desired output:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket"]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Type": null,
  "Properties": ["IsRocket"]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

Is there a way to achieve that with Miller?

My current commands are:

  • mlr -I --csv remove-empty-columns file.csv to clean up the columns
  • mlr --icsv --ojson --jflatsep '/' --jlistwrap cat file.csv > file.json for the conversion
2

There are 2 answers

0
aborruso On BEST ANSWER

It's not probably the way you want to do it. I use also jq.

Running

mlr --c2j  --jflatsep '/' --jlistwrap remove-empty-columns then cat input.csv | \
jq '.[].Properties|=map(select(length > 0))' | \
jq '.[].Type|=(if . == "" then null else . end)'

you will have

[
  {
    "Id": "01:Foo:13",
    "Name": "Foo",
    "Value": 13,
    "Type": "ACME",
    "Properties": [
      "CanExplode",
      "IsRocket"
    ]
  },
  {
    "Id": "02:Bar:42",
    "Name": "Bar",
    "Value": 42,
    "Type": null,
    "Properties": [
      "IsRocket"
    ]
  },
  {
    "Id": "03:Baz:37",
    "Name": "Baz",
    "Value": 37,
    "Type": "BlackMesa",
    "Properties": [
      "CanExplode",
      "IsAlive",
      "IsHungry"
    ]
  }
]
0
Fravadona On

Using Miller, you can "filter out" the empty fields from each record with:

mlr --c2j --jflatsep '/' --jlistwrap put '
    $* = select($*, func(k,v) {return v != ""})
' file.csv

remark: actually, we're building a new record containing the non-empty fields instead of deleting the empty fields from the record; the final result is equivalent though:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket"]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Properties": ["IsRocket"]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]