Flattening a control flow graph to structured code

1.2k views Asked by At

I would like to render a control flow graph (CFG) out to high-level code. Normally this is very easy; walk the tree, render each basic block in turn, glue it all together with gotos.

Unfortunately, gotos are out of fashion these days, and most modern languages don't support them. So I need some way to glue my basic blocks together using only those control flow statements that exist in the language: for, while, do...while, if, break and continue. (I'm not willing to consider building a state machine using variables.)

It would appear that while there are algorithms to do this, they will not work in every case. That is, it's possible to construct a CFG that cannot be flattened to structured code using only the above limited set of control flow structures.

This seems intuitively obvious to me, but I can't prove it (and the documentation for the algorithms I've found don't go into more detail). And I haven't been able to find an example of a CFG which can't be flattened like this.

I would like to know, definitively, if this is possible or not.

Option (a): does anyone have a example of a CFG which cannot be flattened as described above? (Which will tell me that it's not possible.)

Option (b): does anyone have a proof that CFGs can be flattened as described above? (Which will tell me that it is possible.) An algorithm to do it would be highly desirable, too, as I would then have to make it work...

4

There are 4 answers

1
David Given On BEST ANSWER

I think I have a result.

The answer seems to be: it is not possible. This is from Communications of the ACM, volume 9, pages 366 to 371 in a paper from 1966 called "Flow Diagrams, Turing Machines and Languages with only Two Formation Rules" by Giuseppe Jacopini. CiteSeer link. (Which, amusingly, I found referenced from Knuth's seminal (and, from my point of view, incredibly annoying) Go To Statement Considered Harmful.)

Disappointingly, they don't have a proof, saying they were unable to find one.

The good news is that the paper does describe a strategy for converting an arbitrary CFG into a CFG using only limited control-flow mechanisms in an efficient fashion, using as little state as possible. The paper is pretty hard going but it looks promising.

1
vidstige On

If the Control Flow Graph is not reducible, then it cannot be "flattened" to structured control flow as you describe. Any irreducable CFG contains some variant of the following

Irreducable Control Flow Graph

Here both y and z are loops that enter each other, which would be impossible to create with normal structured control flow.

However, most Control Flow Graphs can be converted. You can use the auxiliary data structure the Dominator Tree to do this. See this implementation in Haskell for example.

2
Bill - K5WL On

irreducible graph

The above graph, while irreducible, can be implemented with structured control flow with the addition of a helper variable, say "jump". We add a block w above x which sets jump=false and goes to x. Then we create a new block v that sets jump=true, and direct the right edge coming out of x to v instead of to z. Then we direct v, x, and z to a new block u, with a condition if jump=true, z else y. Then the first statement of z added to set jump = false. This adds a minimum of code with no duplication and turns this structure into a loop with a single entry.

Reducing irreducible control flow:

reducing irreducible flow

1
dinfuehr On

although this question was asked a long time ago this actually seems to be possible. Mozilla had a similar problem when compiling LLVM to JS (or now WebAssembly). JS and WebAssembly only allow structured control flow, while LLVM allows arbitrary control flow.

They'v written a paper about this which is also used for WebAssembly:

This idea is modeled on the Relooper algorithm from 2011. There is a proof there that any control flow can be represented in a structured way, using just the available control flow constructs in JavaScript, and using a helper variable like label mentioned in the Tilt semantics, without any code duplication (other approaches split nodes, and have bad worst-case code size situations). The relooper has also been implemented in Emscripten, and over the last 4 years we have gotten a lot of practical experience with it, showing that it gives good results in practice, typically with little usage of the helper variable.