Generate shorthand function signature

428 views Asked by At

Let's assume a function foo() with the following four overloads:

foo(a, b)
foo(a, b, d)
foo(a, c)
foo(a, c, d)

I want to generate a concise string that represents all overloads at once. In this case, the result should be foo(a, (b|c), [d]).

Edit: There is usually more than one concise representation. My goal is to get a representation that is as short as possible, counting only parameters. So foo(a, (b|c), [d]) has length 4 and is thus better than foo(a, ((b, [d])|(c, [d]))), which has length 5.

Is there an existing algorithm to solve this (or a similar) problem?

If not, can anyone sketch an approach?

I'm not picky about the programming language (I'm using C#, though).

The rules are:

  • Parameters with the same name represent the same thing for all overloads. a is a, b is b...
  • When collecting all distinct parameters over all overloads (in this case, a, b, c, d), every overload will adhere to this parameter order.
  • [...] means that the enclosed sub-expression can be omitted as a whole.
  • (...|...|...) means a choice of one of the sub-expressions. For readability's sake, such a sub-expression must not be empty.

To illustrate further: The (rather contrived) function bar()

bar(a, b,          f, g, h, i)
bar(a, b,          f, g, h)
bar(a, b,          f, g)

bar(a,    c,          g, h, i)
bar(a,    c,          g, h)
bar(a,    c,          g)

bar(a,       d,    f, g, h, i)
bar(a,       d,    f, g, h)
bar(a,       d,    f, g)

bar(a,          e, f, g, h, i)
bar(a,          e, f, g, h)
bar(a,          e, f, g)

should be represented as bar(a, (((b|d|e), f)|c), g, [h, [i]]).

3

There are 3 answers

0
Daniel Wolf On BEST ANSWER

First, let's assign some nomenclature.

  • [...] is an Option.
  • ..., ... is a Sequence.
  • ... | ... is a Choice.

It seems the problem is tricky for two reasons. First, the syntax just isn't the same as in a Boolean expressions. For instance, although a Choice is similar to an OR, it means "take any one", not "take at least one". So an algorithm that generates an optimal Boolean expression might result in a suboptimal result once "translated" to our syntax.

Second, the optimal solution might be something like a Sequence within a Choice within a Sequence within an Option. So any algorithm that can only create one kind of structure (like a Choice of Sequences) cannot possibly always return an optimal solution.

The following describes the solution I've found. There is also a working implementation.

First, we need to create a list of all distinct parameters over all overloads. As per the question, each overload will stick to this parameter order. So each overload can be represented as a Boolean array where each entry indicates whether the corresponding parameter is present. Now, the parameter list along with the list of overloads is given to a recursive function that works like this:

  1. Remove duplicate overloads, so that each overload is distinct.
  2. If there is only one overload, return a Sequence of the used parameters.
  3. If one of the overloads is empty: call the function recursively with all other overloads and return the result within an Option.
  4. Split the parameter list into constant areas (that are the same for all overloads) and independent areas. Each independent area should be kept as short as possible. An area is independent if you can take any overload and replace the flags within that area with those from any other overload. If this splitting results in at least one constant area, return a Sequence containing the constant parts and recurse for the independent areas.
  5. If all this fails, it may be because the overloads are too dissimilar. So create multiple groups of similar overloads. To do this by brute force, generate all partitions (see How to find all partitions of a set). For each group of overloads within each partition, call the function recursively and combine the results with a Choice. Then return the shortest of these Choices.

I believe that for the reasons stated above, an algorithm that finds optimal solutions cannot be much simpler. I cannot prove that this algorithm does in fact always find the optimal solution, but for the signatures I've tried so far, it did.

2
MaMazav On

Actually that problem can be reduced to simplifying logic circuit. You can use Karnaugh map to perform the simplification:

http://en.wikipedia.org/wiki/Karnaugh_map

Edit: the circuit minimization problem: http://en.wikipedia.org/wiki/Circuit_minimization

The reduction from the overloading problem to circuit minimization based on the assumption that no order change exist between the function parameter. The reduction is performed by writing a True Table in which the input parameters of the circuit are exactly all the possible parameters of the function, and for each existing overloading the output of the circuit will be '1' for the row in which all (and exactly) the used parameters of the overloading are '1'.

1
MaMazav On

I don't know if there is a standard way to solve it, but here is a suggested heuristic. Notice that I don't consider performance in this suggestion.

  1. You can always represent such overloads as a "trivial" form of "OR"ed expression of all possible combinations:

    foo( (a, b) | (a, b, d) | (a, c) | (a, c, d) )
    
  2. If you want to extract more simple forms, you can try a greedy algorithm. Start from the trivial form of ORed expressions. Then use the following basic step - compare pairs of expressions to see if they can be grouped by:

    • Or |, . e.g. (a, b) | (a, c) -> (a, (b|c))
    • Optionality [], . e.g. (a, b) | (a, b, d) -> (a, b, [d])

    The basic step should be done:

    • until no grouping is possible.
    • recursively: the algorithm should traverse sub-expressions to check if internal pairs can be grouped.

The above algorithm does not guarantee optimal form. For example, here is a possible execution of the above algorithms on the foo input ('{' are for readability only, they are identical to '(' ):

    (a, b) | (a, b, d) | (a, c) | (a, c, d)
    (a, b, [d]) | (a, c) | (a, c, d)
    (a, b, (c | [d]) | (a, c, d)
    a, {(b, {c | [d]}) | (c, d)}

which is much complicated than the form you presented, a, (b|c), [d]. To make an optimal form of the expression you should first declare what is an optimal form. Based on such declaration you can decide whether you can use the greedy algorithm as a starting point and force the result to be optimized for your needs, or you should have another algorithm at all.

Here are to demonstrations of how you can get more optimized forms:

  1. The algorithm can be forced to group by OR expressions before optionality. Then the above execution is invalid, and a typical execution will be like:

    (a, b) | (a, b, d) | (a, c) | (a, c, d)
    {a, (b | c)} | (a, c) | (a, c, d)
    {a, (b | c)} | (a, b, d) | (a, c, d)
    a, { (b | c) | (b, d) | (c, d) }
    a, { (b | c) | (b | c, d) }
    a, { (b | c) , [d] }
    
  2. The algorithm can be back-traced for all possible orders of grouping operations to find the most optimal form exist.