GREP - finding all occurrences of a string

2.2k views Asked by At

I am tasked with white labeling an application so that it contains no references to our company, website, etc. The problem I am running into is that I have many different patterns to look for and would like to guarantee that all patterns are removed. Since the application was not developed in-house (entirely) we cannot simply look for occurrences in messages.properties and be done. We must go through JSP's, Java code, and xml.

I am using grep to filter results like this:

grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...

The patterns are escaped when I'm using them on the command line; however, I don't feel this pattern matching is very robust. There could possibly be occurrences that have import in them (unlikely) or even /* (the beginning of a javadoc comment).

All of the text output to the screen must come from a string declaration somewhere or a constants file. So, I can assume I will find something like:

public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";

I would like to find that occurrence as well as:

public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";

Alternatively, if we had an internal crawler / automated tests, I could simply pull back the xhtml from each page and check the source to ensure it was clean.

2

There are 2 answers

1
psihodelia On BEST ANSWER

I would use sed, not grep! Sed is used to perform basic text transformations on an input stream. Try s/regexp/replacement/ option with sed command.

You can also try awk command. It has an option -F for fields separation, you can use it with ; to separate lines of you files with ;.

The best solution will be however a simple script in Perl or in Python.

3
grossvogel On

To address your concern about missing some occurrences, why not filter progressively:

  1. Create a text file with all possible matches as a starting point.
  2. Use filter X (grep for '^import', for example) to dump probable false positives into a tmp file.
  3. Use filter X again to remove those matches from your working file (a copy of [1]).
  4. Do a quick visual pass of the tmp file and add any real matches back in.
  5. Repeat [2]-[4] with other filters.

This might take some time, of course, but it doesn't sound like this is something you want to get wrong...