I have a file, FILE1, of some 19.000 lines that has the following format:
PAAXXXX PAAXXXX 0 0 1 -9
PAAXXXY PAAXXXY 0 0 1 -9
PAAXXYX PAAXXYX 0 0 2 -9
PAAXYXX PAAXYXX 0 0 2 -9
PAAYXXX PAAYXXX 0 0 1 -9
PAAYYXX PAAYYXX 0 0 1 -9
PAAYYYX PAAYYYX 0 0 2 -9
PAAAAAB PAAAAAB 0 0 1 -9
The IDs in columns 1 and 2 are the same in each case, col 3 and 4 are always 0, col 5 is either 1 or 2 and col 6 is always -9.
I have two more files, FILE2 and FILE3, which contain only IDs and do not overlap. For example:
head -3 FILE2
PAAXXYX
PAAXYXX
PAAYXXX
head -2 FILE3
PAAYYYX
PAAAAAB
I would like to change the value of col6 in FILE1 based on overlapping of col1 or col2 with the IDs in files 2 and 3. Specifically, I would like to assign 1 to column 6 for all IDs found in FILE2 and 2 for all IDs found in FILE3. My final output might therefore look something like this:
PAAXXXX PAAXXXX 0 0 1 -9
PAAXXXY PAAXXXY 0 0 1 -9
PAAXXYX PAAXXYX 0 0 2 1
PAAXYXX PAAXYXX 0 0 2 1
PAAYXXX PAAYXXX 0 0 1 1
PAAYYXX PAAYYXX 0 0 1 -9
PAAYYYX PAAYYYX 0 0 2 2
PAAAAAB PAAAAAB 0 0 1 2
Note that if some IDs in FILE1 are found neither in FILE2 or FILE3, I would like to retain the original value. Furthermore it is crucial that the order of the output is not changed. It would be fine to solve the problem in two steps, first comparing FILE1 to FILE2 and then FILE1 to FILE3.
I feel that my problem might potentially be solved by running sed in a for-loop but so far I have been unsuccessful. Thank you very much for your help.
In awk, using ARGIND