unix tools to parse file on the command line

148 views Asked by At

I have a python script that looks like the following that I want to transform:

import sys
# more imports


''' some comments '''

class Foo:
   def _helper1():
      etc.

   def _helper2():
      etc.

   def foo1():
      d = { a:3, b:2, c:4 }
      etc.

   def foo2():
      d = { a:2, b:2, c:7 }
      etc.

   def foo3():
      d = { a:3, b:2, c:7 }
      etc.

   etc.

   if __name__ == "__main__":    
      etc.

I'd like to be able to parse JUST the foo*() functions and keep just the ones that have certain attributes, like d={a:3, b:2}. Obviously keep everything else that is non foo*() so the transformation will still run. The foo*() will be well defined though d may have different key, values.

Is there some set of unix tools I can use to do this through chaining? I can use grep to identify foo but how would I scan the next couple of lines to apply the keep or reject portion of my logic?

edit: note, i'm trying to see if it's reasonable to do this with command line tools before writing a custom parser. i know how to write the parser.

1

There are 1 answers

2
Prune On

You haven't specified your problem with enough detail to recommend a particular solution, but there are many tools and techniques that will handle this type of problem.

As I understand this, you want to

  1. Identify the boundaries of your class
  2. Identify the methods within the class
  3. Remove the methods lacking certain textual features

My general approach to this would be a script with logic based on "open old and new files; write everything you read from the old file, unless ."

You can blithely write things until you get into the class (one flag) and start finding methods (another flag). The one slightly tricky part here is the buffering: you need to keep the text of each method until you know whether it contains the target text. You can either read in the entire method (minor parsing task) and search that for the target, or simply hold lines of text until you find the target (then return to your write-it-all mode) or run off the end (empty the buffer without writing).

This is simply enough that you could cobble a script in any handy language to handle the problem. UNIX provides a variety of tools; in that paradigm I'd use awk. However, I recommend a read-friendly tool, such as Python or Perl. If you want to move formally into the world of parsing, I suggest a trivial Lex-YACC couplet: you can have very simple tokens (perhaps even complete lines, depending on your coding style) and actions (write line, hold line, set status flag, flush buffer, etc.).

Is that enough to get you moving?