Bash: Replace values of a column while retaining line order.

Question

Bash: Replace values of a column while retaining line order.

105 views Asked by Sigurgeir At 08 June 2015 at 09:15

I have a file, FILE1, of some 19.000 lines that has the following format:

PAAXXXX PAAXXXX 0 0 1 -9
PAAXXXY PAAXXXY 0 0 1 -9
PAAXXYX PAAXXYX 0 0 2 -9
PAAXYXX PAAXYXX 0 0 2 -9
PAAYXXX PAAYXXX 0 0 1 -9
PAAYYXX PAAYYXX 0 0 1 -9
PAAYYYX PAAYYYX 0 0 2 -9
PAAAAAB PAAAAAB 0 0 1 -9

The IDs in columns 1 and 2 are the same in each case, col 3 and 4 are always 0, col 5 is either 1 or 2 and col 6 is always -9.

I have two more files, FILE2 and FILE3, which contain only IDs and do not overlap. For example:

head -3 FILE2
    PAAXXYX 
    PAAXYXX
    PAAYXXX

head -2 FILE3
    PAAYYYX
    PAAAAAB

I would like to change the value of col6 in FILE1 based on overlapping of col1 or col2 with the IDs in files 2 and 3. Specifically, I would like to assign 1 to column 6 for all IDs found in FILE2 and 2 for all IDs found in FILE3. My final output might therefore look something like this:

PAAXXXX PAAXXXX 0 0 1 -9
PAAXXXY PAAXXXY 0 0 1 -9
PAAXXYX PAAXXYX 0 0 2 1
PAAXYXX PAAXYXX 0 0 2 1
PAAYXXX PAAYXXX 0 0 1 1
PAAYYXX PAAYYXX 0 0 1 -9
PAAYYYX PAAYYYX 0 0 2 2
PAAAAAB PAAAAAB 0 0 1 2

Note that if some IDs in FILE1 are found neither in FILE2 or FILE3, I would like to retain the original value. Furthermore it is crucial that the order of the output is not changed. It would be fine to solve the problem in two steps, first comparing FILE1 to FILE2 and then FILE1 to FILE3.

I feel that my problem might potentially be solved by running sed in a for-loop but so far I have been unsuccessful. Thank you very much for your help.

Original Q&A

There are 1 answers

**123** · Accepted Answer · 2015-06-08T09:24:59+00:00

123 On 08 June 2015 at 09:24 BEST ANSWER

In awk, using ARGIND

awk 'ARGIND~"1|2"{a[$1]=ARGIND;next}a[$1]{$NF=a[$1]}1' FILE2 FILE3 FILE1

PAAXXXX PAAXXXX 0 0 1 -9
PAAXXXY PAAXXXY 0 0 1 -9
PAAXXYX PAAXXYX 0 0 2 1
PAAXYXX PAAXYXX 0 0 2 1
PAAYXXX PAAYXXX 0 0 1 1
PAAYYXX PAAYYXX 0 0 1 -9
PAAYYYX PAAYYYX 0 0 2 2
PAAAAAB PAAAAAB 0 0 1 2

TechQA.

Bash: Replace values of a column while retaining line order.

There are 1 answers

Related Questions in BASH

Related Questions in FOR-LOOP

Related Questions in SED

Related Questions in STRING-COMPARISON

Popular Questions

Popular Tags

Trending Questions