Remove all characters before specific number in Nth column at first match NOT last

250 views Asked by At

I am trying to remove a random block of characters from the 5th column in a dataset.

Sample data:

A | 12 | AA | 24 | Test to go and keep 192.168.1.1 > 192.168.2.1 | B

Result should look like:

A | 12 | AA | 24 | 192.168.1.1 > 192.168.2.1 | B

I have this so far:

awk 'BEGIN{FS=OFS="|"} {gsub(".*? 192","", $5 )} 1' file.txt

However this removes everything in the 5th column before the last match.

What the code does now:

.168.2.11

I need to remove everything before the first match not last

2

There are 2 answers

5
RavinderSingh13 On

With your shown samples, please try following awk code. Simple explanation would be: set field separator and output field separator as | for all lines of Input_file. Then globally substitute spaces AND alphabets with NULL in 5th field. Add spaces as per shown samples before and after 5th field and finally print the edited/non-edited current line.

awk 'BEGIN{FS=OFS="|"} {gsub(/[[:alpha:]]+|[[:space:]]+/,"",$5);$5=" "$5" "} 1' Input_file


EDIT: In case you want to match always IP address > IP address form in 5th field then simply try following.

awk 'BEGIN{FS=OFS="|"} match($0,/([0-9]+\.){3}[0-9]+ > ([0-9]+\.){3}[0-9]+/){$5=substr($0,RSTART,RLENGTH)} 1' Input_file
2
pii_ke On

If IP address in column 5 of your file always starts with some particular numbers, for example " 192.168.", then, you can use: awk 'BEGIN{FS=OFS="|"}{$5=substr($5, index($5, " 192.168."))}1' file.txt