Let's assume a function foo()
with the following four overloads:
foo(a, b)
foo(a, b, d)
foo(a, c)
foo(a, c, d)
I want to generate a concise string that represents all overloads at once. In this case, the result should be foo(a, (b|c), [d])
.
Edit: There is usually more than one concise representation. My goal is to get a representation that is as short as possible, counting only parameters. So foo(a, (b|c), [d])
has length 4 and is thus better than foo(a, ((b, [d])|(c, [d])))
, which has length 5.
Is there an existing algorithm to solve this (or a similar) problem?
If not, can anyone sketch an approach?
I'm not picky about the programming language (I'm using C#, though).
The rules are:
- Parameters with the same name represent the same thing for all overloads.
a
isa
,b
isb
... - When collecting all distinct parameters over all overloads (in this case,
a, b, c, d
), every overload will adhere to this parameter order. [...]
means that the enclosed sub-expression can be omitted as a whole.(...|...|...)
means a choice of one of the sub-expressions. For readability's sake, such a sub-expression must not be empty.
To illustrate further: The (rather contrived) function bar()
bar(a, b, f, g, h, i)
bar(a, b, f, g, h)
bar(a, b, f, g)
bar(a, c, g, h, i)
bar(a, c, g, h)
bar(a, c, g)
bar(a, d, f, g, h, i)
bar(a, d, f, g, h)
bar(a, d, f, g)
bar(a, e, f, g, h, i)
bar(a, e, f, g, h)
bar(a, e, f, g)
should be represented as bar(a, (((b|d|e), f)|c), g, [h, [i]])
.
First, let's assign some nomenclature.
[...]
is an Option...., ...
is a Sequence.... | ...
is a Choice.It seems the problem is tricky for two reasons. First, the syntax just isn't the same as in a Boolean expressions. For instance, although a Choice is similar to an OR, it means "take any one", not "take at least one". So an algorithm that generates an optimal Boolean expression might result in a suboptimal result once "translated" to our syntax.
Second, the optimal solution might be something like a Sequence within a Choice within a Sequence within an Option. So any algorithm that can only create one kind of structure (like a Choice of Sequences) cannot possibly always return an optimal solution.
The following describes the solution I've found. There is also a working implementation.
First, we need to create a list of all distinct parameters over all overloads. As per the question, each overload will stick to this parameter order. So each overload can be represented as a Boolean array where each entry indicates whether the corresponding parameter is present. Now, the parameter list along with the list of overloads is given to a recursive function that works like this:
I believe that for the reasons stated above, an algorithm that finds optimal solutions cannot be much simpler. I cannot prove that this algorithm does in fact always find the optimal solution, but for the signatures I've tried so far, it did.