I am trying to match number at the end of the line ($), print relevant paragraphs and ignore third paragraph. Here is data:
this is first paragraph
number 200
with some text
this is second paragraph
with some text
number 200
this is third paragraph
with some text
number 2001
This command matches only first paragraph:
awk -v RS="" -v ORS="\n\n" "/number 200\n/" file
This command matches only second paragraph:
awk -v RS="" -v ORS="\n\n" "/number 200$/" file
Seems the problem is that awk understands character "$" as end of record instead of line. Is there some elegant way how to overcome this? Unfortunately I do not have grep that can work with paragraphs.
UPDATE:
Expected output:
this is first paragraph
number 200
with some text
this is second paragraph
with some text
number 200
Using any awk:
Regarding
Seems the problem is that awk understands character "$" as end of record instead of line
- that's not a problem, that's the definition of$
. In a regexp$
meansend of string
, it only appears to meanend of line
if the string you're matching against just happens to be a single line, e.g. as read by grep, sed, and awk by default. When you're matching against a string containing multiple lines (e.g. using-z
in GNU grep or GNU sed orRS=""
in awk orRS='^$'
in GNU awk) then you should expect$
to match just once at the and of that string (and^
just once at the start of it), there's nothing special about newlines versus any other character in the string and no regexp metachar to match them.Regarding
Unfortunately I do not have grep that can work with paragraphs
- no-one does as, unlike awk, grep doesn't have a paragraph mode.