Match functions with more than two arguments with Regex

129 views Asked by At

I want to write a regular expression in Python that matches functions with more than two arguments, such that the following expressions match:

function(1, 2, 3)
function(1, 2, 3, 4)
function(1, function(1, 2), 3)
function(function(function(1, 2), 2), 2, 3, 4)

but the following don't:

function(1, 2)
function(1, function(1, function(1, 2)))
function(1, function(function(1), 2))

My best attempt was the following expression, which only works for the cases without nested functions:

\w+\((?:.*,){2,}.*\)

What expression should I use instead?

2

There are 2 answers

0
Timeless On

For fun, you can recursively sub/replace the right-most functions with a placeholder (e.g, a hyphen) and only in the end, count the remaining arguments in each expression :

import re

MIN_ARGS = 3

#https://peps.python.org/pep-0008/#function-and-variable-names
pyfunc = r"\b[a-zA-Z_][a-zA-Z0-9_]*"

pat = fr"({pyfunc}\(.*)(?:{pyfunc})?({pyfunc}\(.+?\))(.+)"

def fn(s):
    if len(re.findall(pyfunc, s)) > 1:
        new_s = re.sub(pat, r"\1-\3", s)
        return fn(new_s)
    else:
        return s

Regex : [demo - first iteration]

Output :

# `text` is the multistring holding your expressions

for exp in text.splitlines():
    if (n:=fn(exp).count(",")) > MIN_ARGS-2:
        print(f"{exp:<50}", n+1, f"{'MATCH':>10}")
    else:
        print(f"{exp:<50}", n+1, f"{'NO-MATCH':>10}")

function(1, 2, 3)                                  3      MATCH
function(1, 2, 3, 4)                               4      MATCH
function(1, function(1, 2), 3)                     3      MATCH
function(function(function(1, 2), 2), 2, 3, 4)     4      MATCH
function(1, 2)                                     2   NO-MATCH
function(1, function(1, function(1, 2)))           2   NO-MATCH
function(1, function(function(1), 2))              2   NO-MATCH

From the comments :

I have a pandas dataframe with a bunch of expressions like those. I'd like to delete the rows that have functions calls with more than two arguments

Then, you can use boolean indexing with the same logic used above :

import pandas as pd

df = pd.DataFrame(text.splitlines(), columns=["col"])

# is there at most two arguments ?
mask = [fn(exp).count(",") <= MIN_ARGS-2 for exp in df["col"]]

out = df.loc[mask]

Output :

print(out)

                                        col
4                            function(1, 2)
5  function(1, function(1, function(1, 2)))
6     function(1, function(function(1), 2))
0
tshiono On

An approach using recursion with Pypi regex:

  • Definition of "one argument" (referred by (?2)):
    a string which contains neither comma nor parentheses, or a function call.
    (possible problem: what if the argument looks like a * (b + c) ? => not considered as of now)

    • expressed by: ([^(),]+|(?3))
    • then expanded as: ([^(),]+|(\b\w+\s*\((?:([^(),]+|(?3))(?:\s*,\s*(?2))*)*\)))
  • Definition of "arguments":
    comma separated list of 0 or more argument(s) i.e. (?2)
    comma may be preceded/followed by 0 or more space(s)

    • expressed by: (?:(?2)(?:\s*,\s*(?2))*)*
  • Definition of "function call" (referred by (?3)):
    function name + 0 or more space(s) + '(' + Arguments + ')'

    • expressed by: (\b\w+\s*\((?:([^(),]+|(?3))(?:\s*,\s*(?2))*)*\))

Then the desired pattern, a function with 3 or more arguments, can be expressed by:
(\b\w+\s*\(([^(),]+|(\b\w+\s*\((?:([^(),]+|(?3))(?:\s*,\s*(?2))*)*\)))(?:\s*,\s*(?2)){2,}\))

Code:

import regex

str = '''
function(1, 2, 3)
function(1, 2, 3, 4)
function(1, function(1, 2), 3)
function(function(function(1, 2), 2), 2, 3, 4)
function(1, 2)
function(1, function(1, function(1, 2)))
function(1, function(function(1), 2))
'''

pat = r'(\b\w+\s*\(([^(),]+|(\b\w+\s*\((?:([^(),]+|(?3))(?:\s*,\s*(?2))*)*\)))(?:\s*,\s*(?2)){2,}\))'

m = regex.findall(pat, str)
if m:
    print([x[0] for x in m])    # pick 1st element of tuples

Output:

['function(1, 2, 3)', 'function(1, 2, 3, 4)', 'function(1, function(1, 2), 3)', 'function(function(function(1, 2), 2), 2, 3, 4)']