Omit words in a text

147 views Asked by At

Let's say I have this file (file.txt):

Hello my name is Giorgio,
I would like to go with you
to the cinema my friend

I want to exclude from the text the words: my, is and I (not the whole line).

The words are in a file (words.txt) like this:

my
is
I

So the output must be:

Hello name Giorgio,
would like to go with you
to the cinema friend

How can this be performed?

3

There are 3 answers

14
choroba On BEST ANSWER

You can use sed to turn words.txt into a sed script:

sed 's=^=s/=;s=$=//g=' words.txt | sed -f- file.txt

The difference to the expected output is the whitespace: removing a word doesn't squeeze the surrounding whitespace.

To match only whole words, add the word boundaries \b:

s=^=s/\\b=;s=$=\\b//g=

Perl solution that also squeezes the spaces (and doesn't care about meta characters):

#!/usr/bin/perl
use warnings;
use strict;

open my $WORDS, '<', 'words.txt' or die $!;
my %words;
chomp, $words{$_} = q() while <$WORDS>;

open my $TEXT, '<', 'file.txt' or die $!;
while (<$TEXT>) {
    s=( ?\b(\S+)\b ?)=$words{$2} // $1=ge;
    print;
}
3
123 On

Pretty scruffy version in awk. If the list of words contains meta characters then this will die.It does take into account word boundaries though, so won't match in the middle of words.

awk 'FNR==NR{a[$1];next}
     {for(i in a)gsub("(^|[^[:alpha:]])"i"([^[:alpha:]]|$)"," ")}1' {words,file}.txt

Hello name Giorgio,
 would like to go with you
to the cinema friend

It saves the words from the first file into array a. In the next file for each word saved it simply removes that word from the line using alpha(All alphabetic characters) and the line beginning and end to ensure the word is a complete word. 1 prints the line.

2
Jahid On

This should do it:

#!/bin/bash
cp file.txt newfile.txt # we will change newfile.txt in place
while IFS= read -r line;do
[[ $line != "" ]] && sed -i "s/\b$line[[:space:]]*//g" newfile.txt
done <words.txt
cat newfile.txt

Or modifying @choroba's sed solution:

sed 's=^=s/\\b=;s=$=[[:space:]]*//g=' words.txt | sed -f- file.txt

Both of the above will strip spaces (if any) from the end of matching string.

Output:

Hello name Giorgio,
would like to go with you
to the cinema friend #There's a space here (after friend)