Highlight tab-separated values based on a couple of rules

Question

Highlight tab-separated values based on a couple of rules

60 views Asked by user3341592 At 27 February 2016 at 21:46

I have some tab-separated file for which I want to conditionally highlight certain columns or certain values.

Example of source file:

wn       name          Building Name       8       Desc       char       -> bl
wo       bl_id!*       Building Code       8                  char

I want to:

put in yellow the contents of the 2nd column when the name is suffixed by "!"
put in cyan the contents of the 2nd column and in green the contents of the 7th column when there is such a 7th column
put the little "*" sign in red, in the 2nd column

and a couple of other such rules.

Currently I do that in this way:

cat file.tsv
    | sed -r 's/^([^\t]*)\t([^\t]*)!/\1\t\x1b[33m\2!\x1b[0m/' \
    | sed -r 's/^^([^\t]*)\t([^\t]*)\t(.*)\t-> ([^\t]+)/\1\t\x1b[36m\2\x1b[0m\t\3\t-> \x1b[32m\4\x1b[0m/' \
    | sed -r 's/\*/\x1b[31;1m&\x1b[0m/'

But it's quite complex to read and update.

Is there a better way? I'm quite sure of, but which one?

Are things like GRC or Supercat a way to go? Though, I have to admit that I have a quite important constraint: I'd like the solution to work out-of-the-box in Cygwin. Don't want to have to compile myself tools in there -- for portability reasons of my code.

Can you give hints on how to improve the code to get such a "highlight" functionality?

Original Q&A

There are 2 answers

Thomas Dickey On 27 February 2016 at 22:49

You could make the script more readable by using shell variables (and tput for portability across terminal types):

BOLD_RED=$(tput setaf 1; tput bold)
GREEN=$(tput setaf 2)
YELLOW=$(tput setaf 3)
CYAN=$(tput setaf 6)
COLOR_OFF=$(tput sgr0)

cat file.tsv
    | sed -r 's/^([^\t]*)\t([^\t]*)!/\1\t'"$YELLOW"'\2!'"$COLOR_OFF"'/' \
    | sed -r 's/^([^\t]*)\t([^\t]*)\t(.*)\t-> ([^\t]+)/\1\t'"$CYAN"'\2'"$COLOR_OFF"'\t\3\t-> '"$GREEN"'\4'"$COLOR_OFF"'/' \
    | sed -r 's/\*/'"$BOLD_RED"'&'"$COLOR_OFF"'/'

Likewise, you could make a variable for the regular expression chunk which is everything but tabs, e.g.,

NO_TABS='([^\t]*)'

By the way, one of the sed expressions repeated ^ (a typo).

**Lars Fischer** · Accepted Answer · 2016-02-28T01:54:24+00:00

You could do it with GNU awk, like this (col.awk):

function colText ( text, col) { 
    return sprintf("\033[%sm%s\033[39;0m", col, text); 
}

function yellow( text ){ return colText( text, "33;1" ); }
function cyan  ( text ){ return colText( text, "36;1" ); }
function green ( text ){ return colText( text, "32;1" ); }
function red   ( text ){ return colText( text, "31;1" ); } 

BEGIN {FS=OFS="\t";}

# red * in $2
$2 ~ /\*/ {gsub(/\*/, red("*"), $2); }

# cyan if col 7 present
NF == 7 {print $1, cyan($2), $3, $4, $5, $6, green( $7 ) ;
         next;}

# yellow col2 if there is a !  
$2 ~ /!/ {print $1, yellow($2), $3, $4, $5, $6, $7 }

Use it like this gawk -f col.awk file.tsv

TechQA.

Highlight tab-separated values based on a couple of rules

There are 2 answers

Related Questions in SHELL

Related Questions in TERMINAL

Related Questions in COLORIZE

Related Questions in TEXT-COLORING

Related Questions in ANSI-COLORS

Popular Questions

Popular Tags

Trending Questions