I have time-series data where I would like to find all lines matching each another but values can be different (match until the first tab)! You can see the vimdiff below where I would like to get rid of days that occur only on the other time series.
I am looking for the simplest unix tool to do this!


Simple example
Input
Left file Right File
------------------------ ------------------------
10-Apr-00 00:00 0 || 10-Apr-00 00:00 7
20-Apr 00 00:00 7 || 21-Apr-00 00:00 3
Output
Left file Right File
------------------------ ------------------------
10-Apr-00 00:00 0 || 10-Apr-00 00:00 7
Let's consider these sample input files:
To merge together those lines with the same date:
Explanation
NR==FNR{a[$1]=$0;next;}NRis the number of lines read so far andFNRis the number of lines read so far from the current file. So, whenNR==FNR, we are still reading the first file. If so, save this whole line,$0, in arrayaunder the key of the first field,$1, which is the date. Then, skip the rest of the commands and jump to thenextline.if ($1 in a) print a[$1]"\t||\t"$0If we get here, then we are reading the second file,
file2. If the first field on this line,$1is a date that we already saw infile1, in other words, if$1 in a, then print this line out together with the corresponding line fromfile1. The two lines are separated by tab-||-tab.Alternative Output
If you just want to select lines from
file2whose dates are also infile1, then the code can be simplified:Or, still simpler: