I am trying to load a large 3 GB JSON file. Currently, with JQ utility I can load the entire file in nearly 40 mins. Now, I want to know how I can use parallelism/multi threading approach in JQ in order to complete the process in less amount of time. I am using v1.5
Command Used:
JQ.exe -r -s "map(.\"results\" | map({\"ID\": (((.\"body\"?.\"party\"?.\"xrefs\"?.\"xref\"//[] | map(select(ID))[]?.\"id\"?))//null), \"Name\": (((.\"body\"?.\"party\"?.\"general-info\"?.\"full-name\"?))//null)} | [(.\"ID\"//\"\"|tostring), (.\"Name\"//\"\"|tostring)])) | add[] | join(\"~\")" "\C:\InputFile.txt" >"\C:\OutputFile.txt"
My data:
{
"results": [
{
"_id": "0000001",
"body": {
"party": {
"related-parties": {},
"general-info": {
"last-update-ts": "2011-02-14T08:21:51.000-05:00",
"full-name": "Ibercaja Gestion SGIIC SAPensiones Nuevas Oportunidades",
"status": "ACTIVE",
"last-update-user": "TS42922",
"create-date": "2011-02-14T08:21:51.000-05:00",
"classifications": {
"classification": [
{
"code": "PENS"
}
]
}
},
"xrefs": {
"xref": [
{
"type": "LOCCU1",
"id": "X00893X"
},
{
"type": "ID",
"id": "1012227139"
}
]
}
}
}
},
{
"_id": "000002",
"body": {
"party": {
"related-parties": {},
"general-info": {
"last-update-ts": "2015-05-21T15:10:45.174-04:00",
"full-name": "Innova Capital Sp zoo",
"status": "ACTIVE",
"last-update-user": "jw74592",
"create-date": "1994-08-31T00:00:00.000-04:00",
"classifications": {
"classification": [
{
"code": "CORP"
}
]
}
},
"xrefs": {
"xref": [
{
"type": "ULTDUN",
"id": "144349875"
},
{
"type": "AVID",
"id": "6098743"
},
{
"type": "LOCCU1",
"id": "1001210218"
},
{
"type": "ID",
"id": "1001210218"
},
{
"type": "BLMBRG",
"id": "10009050"
},
{
"type": "REG_CO",
"id": "0000068508"
},
{
"type": "SMCI",
"id": "13159"
}
]
}
}
}
}
]
}
Can someone please help me which command I need to use in v1.5 in order to achieve parallelism/multithreading.
For a file of this size, you need to stream the file in and process one item at a time. First seek to '"results": [' then use a function called something like 'readItem' that uses a stack to match braces, until your opening brace closes, appending each character to your buffer, then deserializes the item once the closing brace is found.
I recommend node.js + lodash for implementation language.