How can I remove all lines from a text file (main.txt
) by checking a second textfile (removethese.txt
). What is an efficient approach if files are greater than 10-100mb. [Using mac]
Example:
main.txt
3
1
2
5
Remove these lines
removethese.txt
3
2
9
Output:
output.txt
1
5
Example Lines (these are the actual lines I'm working with - order does not matter):
ChIJW3p7Xz8YyIkRBD_TjKGJRS0
ChIJ08x-0kMayIkR5CcrF-xT6ZA
ChIJIxbjOykFyIkRzugZZ6tio1U
ChIJiaF4aOoEyIkR2c9WYapWDxM
ChIJ39HoPKDix4kRcfdIrxIVrqs
ChIJk5nEV8cHyIkRIhmxieR5ak8
ChIJs9INbrcfyIkRf0zLkA1NJEg
ChIJRycysg0cyIkRArqaCTwZ-E8
ChIJC8haxlUDyIkRfSfJOqwe698
ChIJxRVp80zpcEARAVmzvlCwA24
ChIJw8_LAaEEyIkR68nb8cpalSU
ChIJs35yqObit4kR05F4CXSHd_8
ChIJoRmgSdwGyIkRvLbhOE7xAHQ
ChIJaTtWBAWyVogRcpPDYK42-Nc
ChIJTUjGAqunVogR90Kc8hriW8c
ChIJN7P2NF8eVIgRwXdZeCjL5EQ
ChIJizGc0lsbVIgRDlIs85M5dBs
ChIJc8h6ZqccVIgR7u5aefJxjjc
ChIJ6YMOvOeYVogRjjCMCL6oQco
ChIJ54HcCsaeVogRIy9___RGZ6o
ChIJif92qn2YVogR87n0-9R5tLA
ChIJ0T5e1YaYVogRifrl7S_oeM8
ChIJwWGce4eYVogRcrfC5pvzNd4
There are two standard ways to do this:
With
grep
:This uses:
-v
to invert the match.-x
match whole line, to prevent, for example,he
to match lines likehello
orhighway to hell
.-F
to use fixed strings, so that the parameter is taken as it is, not interpreted as a regular expression.-f
to get the patterns from another file. In this case, fromremovethese
.With
awk
:Like this we store every line in
removethese
in an arraya[]
. Then, we read themain
file and just print those lines that are not present in the array.