I have time-series data where I would like to find all lines matching each another but values can be different (match until the first tab)! You can see the vimdiff below where I would like to get rid of days that occur only on the other time series.
I am looking for the simplest unix tool to do this!
Simple example
Input
Left file Right File
------------------------ ------------------------
10-Apr-00 00:00 0 || 10-Apr-00 00:00 7
20-Apr 00 00:00 7 || 21-Apr-00 00:00 3
Output
Left file Right File
------------------------ ------------------------
10-Apr-00 00:00 0 || 10-Apr-00 00:00 7
Let's consider these sample input files:
To merge together those lines with the same date:
Explanation
NR==FNR{a[$1]=$0;next;}
NR
is the number of lines read so far andFNR
is the number of lines read so far from the current file. So, whenNR==FNR
, we are still reading the first file. If so, save this whole line,$0
, in arraya
under the key of the first field,$1
, which is the date. Then, skip the rest of the commands and jump to thenext
line.if ($1 in a) print a[$1]"\t||\t"$0
If we get here, then we are reading the second file,
file2
. If the first field on this line,$1
is a date that we already saw infile1
, in other words, if$1 in a
, then print this line out together with the corresponding line fromfile1
. The two lines are separated by tab-||
-tab.Alternative Output
If you just want to select lines from
file2
whose dates are also infile1
, then the code can be simplified:Or, still simpler: