How to split a C program by its function blocks?

223 views Asked by At

I am trying to split a C program by its function blocks. For example,

I tried using regex library and try to split by (){. But of no use. Not sure where to begin.

string = """
int firt(){
    if () { 

    }
}

customtype second(){
    if () { 

    }
    for(){

    }
}
fdfndfndfnlkfe
    """

And I want the result to be a list that has each of the function block as an element: ['int first(){ ... }', 'customtype second(){....}']

I tried the following but getting None

import regex
import re

reg = r"""^[^()\n]+\([^()]*\)\s*
\{
    (?:[^{}]*|(?R))+
\}"""

print(regex.match(reg, string))
2

There are 2 answers

6
Jan On BEST ANSWER

First of all: don't - use a parser instead.
Second, if you insist and to see why should use a parser instead, have a glimpse at this recursive approach (which will only work with the newer regex module):

^[^()\n]+\([^()]*\)\s*
\{
    (?:[^{}]*|(?R))+
\}

See a demo on regex101.com. This will break with comments that include curly braces.


In Python this would be

import regex as re

reg = re.compile(r"""^[^()\n]+\([^()]*\)\s*
\{
    (?:[^{}]*|(?R))+
\}""", re.VERBOSE | re.MULTILINE)

for function in reg.finditer(string):
    print(function.group(0))
0
Levi Lutz On

Parsing source code is a pretty difficult task. Software like Bison generates source code parsers in C, C++, and Java (C code can be used in Python), but you're unlikely to create a regex to solve this problem (at least easily).