Sed: snake case functions

672 views Asked by At

I need a sed script to automatically convert C functions to lower snake case.

What I have so far is the following which will separate camel case words with underscores, but it doesn't lower case them and it affects everything.

sed -i -e 's/\([a-z0-9]\)\([A-Z]\)/\1_\L\2/g' `find source/ -type f`

How do I make it only apply on functions? I.e. only on strings followed by the character '('.

Also, what do I need to make the strings go lower case?

For example, If I have this code:

void destroyPoolLender(PoolLender *lender)
{
    while (!isListEmpty(&lender->pools)) {
        MemoryPool *myPool = listPop(&this->pool);

        if (pool->inUse) {
            logError("%s memory pool still in use. Pool not released.", pool->lenderName);
        } else {
            free(pool);
        }
    }
    listDestroy(&this->pool);
}

It should look like this once converted:

void destroy_pool_lender(PoolLender *lender)
{
    while (!is_list_empty(&lender->pools)) {
        MemoryPool *myPool = list_pop(&this->pool);

        if (pool->inUse) {
            log_error("%s memory pool still in use. Pool not released.", pool->lenderName);
        } else {
            free(pool);
        }
    }
    list_destroy(&lender->pools);
}

Notice how myPool is untouched because it isn't a function name.

2

There are 2 answers

0
Toby Speight On

We can do this with sed. The trick is to match everything up to and including the ( as capture group 2, and use \l rather than \L, to downcase only the first matched character:

s/\([a-z0-9]\)\([A-Z][A-Za-z0-9]*(\)/\1_\l\2/

We can't just use the /g modifier, because subsequent replacements may overlap, so use it in a loop:

#!/bin/sed -rf

:loop
s/([a-z0-9])([A-Z][A-Za-z0-9]*\()/\1_\l\2/
tloop

(I used -r for GNU sed to reduce the number of backslashes I needed).

A further simplification is to match a non-word-boundary; this removes the need for two capture groups:

#!/bin/sed -rf

:loop
s/\B[A-Z]\w*\(/_\l&/
tloop

Demo:

$ sed -r ':loop;s/\B[A-Z]\w*\(/_\l&/;tloop' \
          <<<'SomeType *myFoo = callMyFunction(myBar, someOtherFunction());'
SomeType *myFoo = call_my_function(myBar, some_other_function());

Note that this only modifies function calls and definitions - it can be hard to identify which names are functions, if you're storing or passing function pointers. You might choose to fix those up manually (reacting to compilation errors) if you only have 70k lines to deal with. If you're working with 1M+, you might want a proper refactoring tool.

0
MiniMax On

Solution for bash. It uses information from object files by the nm command. See man nm.

To creating object files from sources you are needing run gcc with -c option for the each source file (may be you have them already, created by the make command. Then, you can skip this step):

gcc -c one.c -o one.o
gcc -c two.c -o two.o

Usage: ./convert.sh one.o two.o

#!/bin/bash

# store original function names to the variable.
orig_func_names=$(
    # get list symbols from all object files
    nm -f sysv "$@" |
    # picks the functions and removes all information except names.
    sed -n '/FUNC/s/\s.*//p' |
    # selects only functions, which contain the uppercase letter in the name.
    sed -n '/[A-Z]/p'
);

# convert camel case names to snake case names and store new names to the variable.
new_func_names=$(sed 's/[A-Z]/_\l&/g' <<< "$orig_func_names")

# create file, containing substitute commands for 'sed'. 
# Example of commands from this file:
# s/\boneTwo\b/one_two/g
# s/\boneTwoThree\b/one_two_three/g
# etc. One line to the each function name.
paste -d'/' <(printf 's/\\b%s\\b\n' ${orig_func_names}) <(printf '%s/g\n' ${new_func_names}) > command_file.txt

# do converting
# change object file extenstions '.o' to C source - '.c' file extensions.
# were this filenames: one.o two.o three.o
# now they are: one.c two.c three.c
# this 'sed' command creates backup for the each file and change the source files. 
sed -i_backup -f command_file.txt "${@/.o/.c}"

Should note, that the time of execution grows exponentially in this solution. For example, if we have 70000 lines and 1000 functions, then it needed do 70 millions checks (70 000 lines * 1000 functions). It would be interesting to know, how much time it will take.


Testing

Input

file one.c

#include <stdio.h>

int one(); 
int oneTwo(); 
int oneTwoThree();
int oneTwoThreeFour();

int one() {
    puts("");
    return 0;
}

int oneTwo() {
    printf("%s", "hello");
    one();
    return 0;
}

int oneTwoThree() {
    oneTwo();
    return 0;   
}

int oneTwoThreeFour() {
    oneTwoThree();
    return 0;   
}

int main() {

    return 0;
}

file two.c

#include <stdio.h>

int two() {
    return 0; 
}

int twoThree() {
    two();
    return 0;
}   

int twoThreeFour() {
    twoThree();
    return 0;    
}   

Output

file one.c

#include <stdio.h>

int one(); 
int one_two(); 
int one_two_three(); 
int one_two_three_four(); 

int one() {
    puts("");
    return 0;   
}

int one_two() {
    printf("%s", "hello");
    one();
    return 0;   
}

int one_two_three() {
    one_two();
    return 0;   
}

int one_two_three_four() {
    one_two_three();
    return 0;   
}

int main() {

    return 0;
}

file two.c

#include <stdio.h>

int two() {
    return 0;  
}

int two_three() {
    two();
    return 0;
}   

int two_three_four() {
    two_three();
    return 0;    
}