grep/sed replace no match with blank space

316 views Asked by At

Example of a file I'm greping information in:

name : server1
description : webserver
memory : 32gb

name : server2
memory : 128gb

name : server3
description : appserver

I'm doing something like this :

cat myfile | egrep -w "name|description|memory" | awk -F" " '{print $3}' >> myfile2

In order to get back information from the second column in myfile.

Then, I format myfile2 to have information from each server on one line (using tr to replace CRLF), separated by semicolonn to import them on Excel.

myfile2:
server1;webserver;32gb
server2;128gb
server3;appserver

Problem is: when egrep doesn't match anything (like description for server2 or memory for server3), there is a gap of one row in myfile2... How can I replace it by a blank space?

Output wanted of my file2 :
server1;webserver;32gb
server2;;128gb
server3;appserver;;

2

There are 2 answers

1
ghoti On BEST ANSWER

I don't see the need to use grep on your input data. The awk command can do almost everything that grep can. Consider the following:

awk -F' *: *' '
  {
    a[$1]=$2;
  }

  /^memory/ {
    printf("%s;%s;%s\n", a["name"], a["description"], a["memory"]);
    delete a;
  }' myfile

The components here are as follows:

  • -F... sets your field delimiter, including whitespace.
  • a[$1]=$2 populates a short-lived array with the data from each record.
  • /^memory/ executes this recipe only on the last line of each group...
  • printf(...) displays your output, and
  • delete a lets you start fresh on the next multi-line record.

You could of course compact this all onto a single line:

awk -F' *: *' '{ a[$1]=$2 } /^memory/ { printf("%s;%s;%s\n", a["name"], a["description"], a["memory"]); delete a }' myfile

Is this what you need?

UPDATE

I see that you've modified your question to include sample data that is different from what the solution above supports. Here's an update that should work with the current example:

function outp() {
        printf("%s;%s;%s\n", a["name"], a["description"], a["memory"]);
}

BEGIN {
        seen=0;
        FS=" *: *";
}

/^name/ && seen {
        outp();
        delete a;
}

/^name/ {
        seen=1;
}

{
        a[$1]=$2;
}

END {
        outp();
}

This uses a function (outp()) to simplify things. It uses the seen variable to determine whether the script has seen any actual data yet (otherwise, the first match of /^name/ would generate empty output). And it continues to use the a array to collect the important fields.

It's important to note that now, instead of assuming you'll have a "memory" at the end of every record, we assume that you'll have a "name" at the beginning of every record. If this assumption is incorrect, please specify how you think you should be able to tell records from each other (i.e. where does one stop and the next one start). Blank lines are an option, for example.

4
Ed Morton On

It SOUNDS like all you need is:

$ awk -v RS= -F' *: *|\n' -v OFS=';' '{print $2,$4,$6}' myfile
server1;webserver;32gb
server2;;128gb

If you want CRLF line endings then just tell awk that by adding -v ORS='\r\n' at the front.

Not sure why you haven't just updated your question yet but it sounds like this is what you really need:

$ cat file  
name : server1
description : webserver
memory : 32gb

name : server2
memory : 128gb

name : server3
description : appserver

.

$ cat tst.awk
BEGIN{
    RS=""
    FS=" *: *|\n"
    OFS=";"
    numNames = split("name description memory",names,/ /)
    for (i=1; i<=numNames; i++) {
        name2nr[names[i]] = i
    }
}
{
    delete vals
    for (i=1;i<=NF;i+=2) {
        vals[name2nr[$i]] = $(i+1)
    }
    for (i=1; i<=numNames; i++) {
        printf "%s%s", vals[i], (i<numNames?OFS:ORS)
    }
}

$ awk -f tst.awk file
server1;webserver;32gb
server2;;128gb
server3;appserver;

It can be written to add a first pass that just figures out the field names instead of hard-coding them in the BEGIN section but then the output order of fields becomes dependent on the order in which they appear in the input so not sure it's worthwhile in this case.