I am trying to load a large 3 GB JSON file. Currently, with JQ utility I can load the entire file in nearly 40 mins. Now, I want to know how I can use parallelism/multi threading approach in JQ in order to complete the process in less amount of time. I am using v1.5
Command Used:
JQ.exe -r -s "map(.\"results\" | map({\"ID\": (((.\"body\"?.\"party\"?.\"xrefs\"?.\"xref\"//[] | map(select(ID))[]?.\"id\"?))//null), \"Name\": (((.\"body\"?.\"party\"?.\"general-info\"?.\"full-name\"?))//null)} | [(.\"ID\"//\"\"|tostring), (.\"Name\"//\"\"|tostring)])) | add[] | join(\"~\")" "\C:\InputFile.txt" >"\C:\OutputFile.txt"
My data:
{
  "results": [
    {
      "_id": "0000001",
      "body": {
        "party": {
          "related-parties": {},
          "general-info": {
            "last-update-ts": "2011-02-14T08:21:51.000-05:00",
            "full-name": "Ibercaja Gestion SGIIC SAPensiones Nuevas Oportunidades",
            "status": "ACTIVE",
            "last-update-user": "TS42922",
            "create-date": "2011-02-14T08:21:51.000-05:00",
            "classifications": {
              "classification": [
                {
                  "code": "PENS"
                }
              ]
            }
          },
          "xrefs": {
            "xref": [
              {
                "type": "LOCCU1",
                "id": "X00893X"
              },
              {
                "type": "ID",
                "id": "1012227139"
              }
            ]
          }
        }
      }
    },
    {
      "_id": "000002",
      "body": {
        "party": {
          "related-parties": {},
          "general-info": {
            "last-update-ts": "2015-05-21T15:10:45.174-04:00",
            "full-name": "Innova Capital Sp zoo",
            "status": "ACTIVE",
            "last-update-user": "jw74592",
            "create-date": "1994-08-31T00:00:00.000-04:00",
            "classifications": {
              "classification": [
                {
                  "code": "CORP"
                }
              ]
            }
          },
          "xrefs": {
            "xref": [
              {
                "type": "ULTDUN",
                "id": "144349875"
              },
              {
                "type": "AVID",
                "id": "6098743"
              },
              {
                "type": "LOCCU1",
                "id": "1001210218"
              },
              {
                "type": "ID",
                "id": "1001210218"
              },
              {
                "type": "BLMBRG",
                "id": "10009050"
              },
              {
                "type": "REG_CO",
                "id": "0000068508"
              },
              {
                "type": "SMCI",
                "id": "13159"
              }
            ]
          }
        }
      }
    }
  ]
}
Can someone please help me which command I need to use in v1.5 in order to achieve parallelism/multithreading.
 
                        
For a file of this size, you need to stream the file in and process one item at a time. First seek to '"results": [' then use a function called something like 'readItem' that uses a stack to match braces, until your opening brace closes, appending each character to your buffer, then deserializes the item once the closing brace is found.
I recommend node.js + lodash for implementation language.