How can I clean up empty fields when converting CSV to JSON with Miller?

Question

How can I clean up empty fields when converting CSV to JSON with Miller?

674 views Asked by Aldoro At 29 September 2022 at 19:41

I have several CSV files of item data for a game I'm messing around with that I need to convert to JSON for consumption. The data can be quite irregular with several empty fields per record, which makes for sort of ugly JSON output.

Example with dummy values:

Id,Name,Value,Type,Properties/1,Properties/2,Properties/3,Properties/4
01:Foo:13,Foo,13,ACME,CanExplode,IsRocket,,
02:Bar:42,Bar,42,,IsRocket,,,
03:Baz:37,Baz,37,BlackMesa,CanExplode,IsAlive,IsHungry,

Converted output:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket", ""]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Type": "",
  "Properties": ["IsRocket", "", ""]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

So far I've been quite successful with using Miller. I've managed to remove completely empty columns from the CSV as well as aggregate the Properties/X columns into a single array.

But now I'd like to do two more things to improve the output format to make consuming the JSON easier:

remove empty strings "" from the Properties array
replace the other empty strings "" (e.g. Type of the second record) with null

Desired output:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket"]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Type": null,
  "Properties": ["IsRocket"]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

Is there a way to achieve that with Miller?

My current commands are:

mlr -I --csv remove-empty-columns file.csv to clean up the columns
mlr --icsv --ojson --jflatsep '/' --jlistwrap cat file.csv > file.json for the conversion

Original Q&A

There are 2 answers

Fravadona On 03 December 2022 at 22:01

Using Miller, you can "filter out" the empty fields from each record with:

mlr --c2j --jflatsep '/' --jlistwrap put '
    $* = select($*, func(k,v) {return v != ""})
' file.csv

remark: actually, we're building a new record containing the non-empty fields instead of deleting the empty fields from the record; the final result is equivalent though:

[
{
  "Id": "01:Foo:13",
  "Name": "Foo",
  "Value": 13,
  "Type": "ACME",
  "Properties": ["CanExplode", "IsRocket"]
},
{
  "Id": "02:Bar:42",
  "Name": "Bar",
  "Value": 42,
  "Properties": ["IsRocket"]
},
{
  "Id": "03:Baz:37",
  "Name": "Baz",
  "Value": 37,
  "Type": "BlackMesa",
  "Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

**aborruso** · Accepted Answer · 2022-10-07T11:32:47+00:00

It's not probably the way you want to do it. I use also jq.

Running

mlr --c2j  --jflatsep '/' --jlistwrap remove-empty-columns then cat input.csv | \
jq '.[].Properties|=map(select(length > 0))' | \
jq '.[].Type|=(if . == "" then null else . end)'

you will have

[
  {
    "Id": "01:Foo:13",
    "Name": "Foo",
    "Value": 13,
    "Type": "ACME",
    "Properties": [
      "CanExplode",
      "IsRocket"
    ]
  },
  {
    "Id": "02:Bar:42",
    "Name": "Bar",
    "Value": 42,
    "Type": null,
    "Properties": [
      "IsRocket"
    ]
  },
  {
    "Id": "03:Baz:37",
    "Name": "Baz",
    "Value": 37,
    "Type": "BlackMesa",
    "Properties": [
      "CanExplode",
      "IsAlive",
      "IsHungry"
    ]
  }
]

TechQA.

How can I clean up empty fields when converting CSV to JSON with Miller?

There are 2 answers

Related Questions in JSON

Related Questions in CSV

Related Questions in MILLER

Popular Questions

Trending Questions