Approaches for reversing sprintf/format

Question

Approaches for reversing sprintf/format

738 views Asked by Daniel Rikowski At 11 February 2011 at 08:19

I have to heuristically determine the format pattern strings by analyzing the formatted results.

For example I have these strings:

You have 3 unread messages.

You have 10 unread messages.

I'm sorry, Dave. I'm afraid I can't do that.

I'm sorry, Frank. I'm afraid I can't do that.

This statement is false.

I want to derive these format strings:

You have %s unread messages

I'm sorry, %s. I'm afraid I can't do that.

This statement is false.

Which approaches and/or algorithms could help me here?

My first thought was using machine learning stuff, but my guts tell me this could be a rather classic problem.

Some additional requirements:

The type of the parameter is irrelevant, i.e. I don't need the information if the parameter originally was %s or %d or if it was padded or aligned.
There can be more than one parameter (or none at all)
Typically the data consists of thousands of formatted strings, but only tens of format patterns.

Original Q&A

There are 1 answers

**Fred Foo** · Answer 1 · 2011-02-11T09:04:31+00:00

Cluster the strings by some metric of similarity (I'd try length of longest common subsequence, LCS). Determining the number of clusters is the hard part, if you don't know it beforehand.
Within each cluster, determine the LCS of all strings in it, recording the position of the gaps that occur. Replace the gaps with %s. (You may want to build a function that returns an LCS-based format string and fold/reduce that over the cluster.)

The above is a greedy algorithm that, given {foobar, fooBaR} produces foo%sa%s. You may want to replace any pair of occurrences of %s separated by a single character (or a single non-whitespace char, etc) by a single %s, recursively.

TechQA.

Approaches for reversing sprintf/format

There are 1 answers

Related Questions in ALGORITHM

Related Questions in STRING

Related Questions in LANGUAGE-AGNOSTIC

Related Questions in PATTERN-MATCHING

Related Questions in FORMAT-STRING

Popular Questions

Popular Tags

Trending Questions