I try to filter a JSON file for all "description" keys containing the value "; Since version" to print their path and the Version contained within that value. I do all this in a bash script.

The JSON I'm filtering is from an API and has the "description" key on different paths.

Here's an excerpt from the JSON the server

So far I grab the JSON from the server with curl and pipe it to jq applying the following filter to get a list of all paths which contain the value in their description:

curl $api | jq 'paths(objects and (.description|index("; Since version")))'

This returns a list with paths to the endpoints containing the searched value and looks like this:

[
  "paths",
  "/v4/user/profileAttributes/{key}",
  "delete"
]
[
  "paths",
  "/v4/users/{user_id}/last_admin_rooms",
  "get"
]
[
  "definitions",
  "GeneralSettings",
  "properties",
  "s3TagsEnabled"
]

I found this filter to return the value for a seemingly random key/value pair of "description" when adapted to the string I search for and not the whole list. Unfortunately I don't quite understand the entire filter as the author didn't explain it in his post. When adapted

curl -s -N $api | jq '. as $in
| reduce paths(type == "string" and test("; Since version")) as $path ({};
    ($in|getpath($path)) as $x
    | if ($path[-1]|type) == "string"
      then .[$path[-1]] = $x
      else .[$path[-2]|tostring] += [$x]
      end )'

returns

{
  "description": "Some_Text_We_Dont_Care_About; Since version 4.10.2 Some_More_Text_We_Dont_Care_About"
}

I would be able to stitch the two filtered responses together if the second filter would return all entries instead of the (seamingly random) one. I posted both filters in here as maybe someone knows how to do it in a cleaner way.

Ideally the end result would return a list with entries like this:

{
  "path": "
    [
    "paths",
    "/v4/users/{user_id}/last_admin_rooms",
    "get"
    ]"
  ",
  "version": {
    "description": "Some_Text_We_Dont_Care_About; Since version 4.10.2 Some_More_Text_We_Dont_Care_About"
  }
}

Here's a excerpt/example what the input looks like:

{
  "info": {4 items},
  "host": "some_hostname",
  "basePath": "/api",
  "tags": [19 items],
  "paths": {
    "/v4/config/info/defaults": {
      "get": {
        "tags": [
          "config"
        ],
        "summary": "Get default values",
        "description": "SomeText; Since version 4.6.0 SomeMoreText",
        "operationId": "getSystemDefaultsInfo",
        "produces": [
          "application/json;charset=UTF-8"
        ],
      }
    }
    "/v4/config/info/general": {
      "get": {
        "tags": [
          "config"
        ],
        "summary": "Get general values",
        "description": "SomeText; Since version 4.6.0 SomeMoreText",
        "operationId": "getSystemDefaultsInfo",
        "produces": [
          "application/json;charset=UTF-8"
        ],
      }
    }
  "definitions": {
    "GeneralSettings": {
      "type": "object",
      "properties": {
        "cryptoEnabled": {
          "type": "boolean",
          "description": "Activation status of encryption"
        },
        "s3TagsEnabled": {
          "type": "boolean",
          "description": "Defines if S3 tags are enabled; Since version 4.9.0 NEW"
        },
        "sharePasswordSmsEnabled": {
          "type": "boolean",
          "description": "Allow sending of share passwords via SMS"
        }
      }
    }
  }
}

1 Answers

1
Community On Best Solutions

let me offer you an alternative solution - using a walk-path based unix utility jtc:

based, on your desired output, with jtc the query would look like:

bash $ <file.json jtc -w'[description]:<.*; Since version.*>R: [-1] <act>k [-1] <path>k [-1] <pathname>k' -T'{ "path": [ {{pathname}}, {{path}}, {{act}}], "version": { "description": {{$0}}} }'
{
   "path": [
      "paths",
      "/v4/config/info/defaults",
      "get"
   ],
   "version": {
      "description": "SomeText; Since version 4.6.0 SomeMoreText"
   }
}
{
   "path": [
      "paths",
      "/v4/config/info/general",
      "get"
   ],
   "version": {
      "description": "SomeText; Since version 4.6.0 SomeMoreText"
   }
}
{
   "path": [
      "GeneralSettings",
      "properties",
      "s3TagsEnabled"
   ],
   "version": {
      "description": "Defines if S3 tags are enabled; Since version 4.9.0 NEW"
   }
}
bash $ 

Given your Json is irregular (w.r.t walking description requirement) not sure, though if you need to see that last object or not. If you don't need it, then knowing the structure of your JSON it's easy to enhance the query to rule out false positive matches like that last one.

EDIT: Explanations:

-T option provides a Json template, where template items (enclosed into double curly braces {{..}}) get interpolated from the namespaces at the end of each walk (-w) if the walk was successful.

Now the walk-path (-w) is the way to walk the source JSON: jtc let walking the Json up and down freely (using subscripts [..]) and performing recursive searches <..>. Though some of the items are directives - they don't do any searching/matching, instead they apply certain actions (the one-letter suffix in this notation <..>S defines whether it's a search or a directive)

Let me break it down here (btw, all the lexemes are described in the link):

  • [description]:<.*; Since version.*>R: - performs an RE search of JSON strings containing ; Since version, but due to RE spelling matches entire such string (note .* at the beginning and at the end of the RE expression). Now attached label [description]: ensures that such RE matching would occur only in those strings with the attached label (and not any others). : at the end of the lexeme (aka quantifier) instructs to find all such occurrences (the quantifiers in both searches and subscripts compatible with Python notation).
  • [-1]<act>k - [-1] address a parent node of the found entry (from the previous walk step), and <act>k memorizes entry's label in the namespace act (for later template interpolation) - the suffix at the end of the search lexeme defines wether it's a search or a directive (in this case k is a directive to extract a label and memorize in the namespace)
  • the reset of the walk path is the same: [-1]<path>k will address a parent from the last found entry (which, in turn a parent of the found entry in the 1st step) and will memorize in the namespace its label (key) under the name path ( names are arbitrary in jtc)
  • [-1]<pathname>k' - does pretty much the same thing, then the walk is over (well, for given iteration - if you recall the first walk lexeme is iterative -it finds each occurrence) and then template-interpolation is applied, which result in the printed item.

You can easily play with the query (removing each lexeme or adding yours, seeing how walking works) - I edited up the lexeme path breaking each lexemes with the spaces.

PS> Disclosure: I'm the creator of the jtc tool