Highlight tab-separated values based on a couple of rules

62 views Asked by At

I have some tab-separated file for which I want to conditionally highlight certain columns or certain values.

Example of source file:

wn       name          Building Name       8       Desc       char       -> bl
wo       bl_id!*       Building Code       8                  char

I want to:

  • put in yellow the contents of the 2nd column when the name is suffixed by "!"
  • put in cyan the contents of the 2nd column and in green the contents of the 7th column when there is such a 7th column
  • put the little "*" sign in red, in the 2nd column

and a couple of other such rules.

Currently I do that in this way:

cat file.tsv
    | sed -r 's/^([^\t]*)\t([^\t]*)!/\1\t\x1b[33m\2!\x1b[0m/' \
    | sed -r 's/^^([^\t]*)\t([^\t]*)\t(.*)\t-> ([^\t]+)/\1\t\x1b[36m\2\x1b[0m\t\3\t-> \x1b[32m\4\x1b[0m/' \
    | sed -r 's/\*/\x1b[31;1m&\x1b[0m/'

But it's quite complex to read and update.

Is there a better way? I'm quite sure of, but which one?

Are things like GRC or Supercat a way to go? Though, I have to admit that I have a quite important constraint: I'd like the solution to work out-of-the-box in Cygwin. Don't want to have to compile myself tools in there -- for portability reasons of my code.

Can you give hints on how to improve the code to get such a "highlight" functionality?

2

There are 2 answers

1
Lars Fischer On BEST ANSWER

You could do it with GNU awk, like this (col.awk):

function colText ( text, col) { 
    return sprintf("\033[%sm%s\033[39;0m", col, text); 
}

function yellow( text ){ return colText( text, "33;1" ); }
function cyan  ( text ){ return colText( text, "36;1" ); }
function green ( text ){ return colText( text, "32;1" ); }
function red   ( text ){ return colText( text, "31;1" ); } 

BEGIN {FS=OFS="\t";}

# red * in $2
$2 ~ /\*/ {gsub(/\*/, red("*"), $2); }

# cyan if col 7 present
NF == 7 {print $1, cyan($2), $3, $4, $5, $6, green( $7 ) ;
         next;}

# yellow col2 if there is a !  
$2 ~ /!/ {print $1, yellow($2), $3, $4, $5, $6, $7 }

Use it like this gawk -f col.awk file.tsv

4
Thomas Dickey On

You could make the script more readable by using shell variables (and tput for portability across terminal types):

BOLD_RED=$(tput setaf 1; tput bold)
GREEN=$(tput setaf 2)
YELLOW=$(tput setaf 3)
CYAN=$(tput setaf 6)
COLOR_OFF=$(tput sgr0)

cat file.tsv
    | sed -r 's/^([^\t]*)\t([^\t]*)!/\1\t'"$YELLOW"'\2!'"$COLOR_OFF"'/' \
    | sed -r 's/^([^\t]*)\t([^\t]*)\t(.*)\t-> ([^\t]+)/\1\t'"$CYAN"'\2'"$COLOR_OFF"'\t\3\t-> '"$GREEN"'\4'"$COLOR_OFF"'/' \
    | sed -r 's/\*/'"$BOLD_RED"'&'"$COLOR_OFF"'/'

Likewise, you could make a variable for the regular expression chunk which is everything but tabs, e.g.,

NO_TABS='([^\t]*)'

By the way, one of the sed expressions repeated ^ (a typo).