Run an external command within jq to manipulate each values of a particular key

72 views Asked by At

I have a json like

[
  {
    "url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link",
    "title": "– Flexibility"
  },
  {
    "url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra",
    "title": "– Pronouns"
  }
]

I got it using curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.'.

I have a command in my machine named unescape_html, a python scipt to unescape the html (replace – with appropriate character).

How can I apply this function on each of the titles using jq.

For example:

I want to run:

unescape_html "&#8211; Flexibility"
unescape_html "&#8211; Pronouns"

The expected output is:

[
  {
    "url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link",
    "title": "– Flexibility"
  },
  {
    "url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra",
    "title": "– Pronouns"
  }
]

Update 1:

If jq doesn't have that feature, then i am also fine that the command is applied on rg.

I mean, can i run unescape_html on $2 on the line:

rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}'

Any other bash approach to solve this problem is also fine. The point is, i need to run unescape_html on title, so that i get the expected output.

Update 2:

The following command:

curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= @sh "unescape_html \(.)")'

gives:

[
  {
    "url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link",
    "title": "unescape_html '&#8211; Flexibility'"
  },
  {
    "url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra",
    "title": "unescape_html '&#8211; Pronouns'"
  }
]

Just not evaluating the commands.

The following command works:

curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= sub("&#8211;"; "–"))'

But it only works for &#8211;. It will not work for other reserved characters.

Update 3:

The following command:

curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq -r '.[] | "unescape_html \(.title | @sh)"' | bash

is giving:

– Flexibility
– Pronouns

So, it is applying the bash function. Now I need to format is so that the urls are come with the title in json format.

1

There are 1 answers

0
Ahmad Ismail On

OP Here.

Both of the following commands are working for me:

curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r 'echo $1 $(unescape_html "$2")' | bash | rg -o '(https?://\S+)\s(.*)' -r 'echo $1 $(unescape_html "$2")'  -r '{"url": "$1", "title": "$2"}' | jq -s '.'

and

curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -or '$1 $2' | while read -r link title; do echo "$link" "$(unescape_html "$title")"; done | rg -o '(https?://\S+)\s(.*)' -r '{"url": "$1", "title": "$2"}' | jq -s '.'