Use jq to count on multiple levels

Question

Use jq to count on multiple levels

10.6k views Asked by JustChill At 24 June 2015 at 19:49

We've discovered some domain names tied to infections. Now we have a list of DNS names in a .json file, and I'd like to produce a summarized output showing: a list of users, the unique domains they visited, the total count. Bonus points if I can also get count per domain name.

Here is a sample of the file:

{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071870}
{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071875}
{"machine": "possible_victim01", "domain": "soevil.com", "timestamp":1435071877}
{"machine": "possible_victim02", "domain": "bad.com", "timestamp":1435071877}
{"machine": "possible_victim03", "domain": "soevil.com", "timestamp":1435071879}

Ideally, I would like the output to be something like:

{"possible_victim01": "total": 3, {"evil.com": 2, "soevil.com": 1}}
{"possible_victim02": "total": 1, {"bad.com": 1}}
{"possible_victim03": "total": 1, {"soevil.com": 1}}

I would gladly settle for:

{"possible_victim01": "total": 3, ["evil.com", "soevil.com"]}
{"possible_victim02": "total": 1, ["bad.com"]}
{"possible_victim03": "total": 1, ["soevil.com"]}

I can get a total count of records per user, but I lose the list of domains:

cat sample.json | jq -s 'group_by(.machine) | map({machine:.[0].machine,domain:.[0].domain, count:length}) '
[{"machine": "possible_victim01", "domain": "evil.com", "count": 3},  
{"machine": "possible_victim02", "domain": "bad.com", "count": 1},
{"machine": "possible_victim03", "domain": "soevil.com", "count": 1}]

This post describes how to solve the second half of the problem... JQ Aggregations and Crosstabs. I haven't found anything yet that describes the first half, getting to:

{"machine": "possible_victim01", "domain": "evil.com", "count":2}
{"machine": "possible_victim01", "domain": "soevil.com", "count":1}
{"machine": "possible_victim02", "domain": "bad.com", "count":1}
{"machine": "possible_victim03", "domain": "soevil.com", "count":1}

Original Q&A

There are 3 answers

jq170727 On 05 September 2017 at 07:38

Here is a solution using reduce, getpath and setpath

reduce .[] as $o (
  {}
; [$o.machine, "total"] as $p1
| [$o.machine, "domains", $o.domain] as $p2
| setpath($p1; 1+getpath($p1))
| setpath($p2; 1+getpath($p2))
)

If filter.jq contains this filter and data.json contains the sample data then the command

$ jq -M -s -f filter.jq data.json

produces

{
  "possible_victim01": {
    "total": 3,
    "domains": {
      "evil.com": 2,
      "soevil.com": 1
    }
  },
  "possible_victim02": {
    "total": 1,
    "domains": {
      "bad.com": 1
    }
  },
  "possible_victim03": {
    "total": 1,
    "domains": {
      "soevil.com": 1
    }
  }
}

peak On 28 June 2015 at 06:15

Using group_by in the manner described is fine, but if you have a very large number of lines (i.e. JSON entities) to read as suggested by the sample provided, then you may run into performance issues and/or capacity constraints.

These issues can be resolved very effectively in any version of jq with the "inputs" builtin (e.g. jq 1.5rc1).

Please note that using "inputs" you would invoke jq with the -n option, like this:

jq -n -f program.jq data.json

Please note also that it is preferable here to produce JSON output, and the following seems to be close to what is wanted:

{"possible_victim01": { "total": 3, "evildoers": {"evil.com": 2, "soevil.com": 1} },
 "possible_victim02": ...}`

The following program could be made more concise but the presentation here is intended to make the process transparent, assuming a basic understanding of jq. If there is magic here, it is that one does not have to make a special case of "null".

reduce inputs as $line
  ({};
   . as $in
   | ($line.machine) as $machine
   | ($line.domain) as $domain
   | ($in[$machine].evildoers ) as $evildoers
   | . + { ($machine): {"total": (1 + $in[$machine]["total"]),
                        "evildoers": ($evildoers | (.[$domain] += 1)) }} )

Using the sample input provided, the output is:

{
  "possible_victim01": {
    "total": 3,
    "evildoers": {
      "evil.com": 2,
      "soevil.com": 1
    }
  },
  "possible_victim02": {
    "total": 1,
    "evildoers": {
      "bad.com": 1
    }
  },
  "possible_victim03": {
    "total": 1,
    "evildoers": {
      "soevil.com": 1
    }
  }
}

**Jimmy** · Accepted Answer · 2015-06-24T20:16:01+00:00

You need to to do group_by twice, once to group by the machine name, and then a sub-grouping to get the sub-counts for each domain.

jq query:

group_by(.machine) | map({
    "machine": .[0].machine, 
    "total":length, 
    "domains": (group_by(.domain) | map({
        "key":.[0].domain, 
        "value":length}) | from_entries
    )
})

Example output:

{
  "machine": "possible_victim01",
  "total": 3,
  "domains": {
    "evil.com": 2,
    "soevil.com": 1
  }
}
{
  "machine": "possible_victim02",
  "total": 1,
  "domains": {
    "bad.com": 1
  }
}
{
  "machine": "possible_victim03",
  "total": 1,
  "domains": {
    "soevil.com": 1
  }
}

TechQA.

Use jq to count on multiple levels

There are 3 answers

Related Questions in JSON

Related Questions in GROUP-BY

Related Questions in JQ

Related Questions in AGGREGATION

Related Questions in COUNTING

Popular Questions

Popular Tags

Trending Questions