Is finding pointers in C/C++ code statically equivalent to the Halting Ρroblem?

568 views Asked by At

I'm not too deeply rooted in the very formal side of static code analysis, hence this question.

A couple of years ago I read that distinguishing code from data using static code analysis is equivalent to the Halting Problem. (Citation needed, but I don't have it anymore. Stackoverflow has threads on this here or here.) At least for common computer architectures based on the Von Neumann architecture where code and data share the same memory this seemed to make sense.

Now I'm looking at the static analysis of C/C++ code and pointer analysis; the program does not execute. Somehow I have a feeling that tracking all creations and uses of pointer values statically is similar to the Halting Problem because I can not determine if a given value in memory is a pointer value, i.e. I can not track the value-flow of pointer values through memory. Alias analysis may narrow down the problem, but it seems to become less useful in the face of multi-threaded code.

(One might even consider tracking arbitrary values, not just pointers: constructing a complete value-flow for any given "interesting" value seems equivalent to the Halting Problem.)

As this is just a hunch, my question is: are the more formal findings on this that I can refer to? Am I mistaken?

4

There are 4 answers

15
R.. GitHub STOP HELPING ICE On

It's almost certainly equivalent, modulo the fact that C is not a turing-equivalent language (a given C implementation is a gigantic finite state machine rather than a turing machine, due to the Representation of Types). Pointers need not be kept in their original representations in objects whose effective type is pointer type; you can examine the representation and perform arbitrary operations on it, for example, encrypting pointers and decrypting them later. Determining whether an arbitrary computation is reversible, or whether two computations are inverses of one another, is (offhand) probably equivalent to determining halting.

4
Komi Golov On

If I understood you correctly: yes, checking whether a C or C++ program accesses an invalid pointer is equivalent to the halting problem (of a C or C++ program, in any case).

Suppose you had a tool that told you whether a program accessed an invalid pointer, and a program you wanted to check for halting. By adding extra information to each pointer you can make it checkable (at runtime) whether the pointer is valid or not; add such checks, with an infinite loop on failure. You now have a program with no invalid pointer accesses. By replacing all places the program can terminate with an invalid pointer access you get a program which has an invalid pointer access if and only if the original program terminates.

5
Cheers and hth. - Alf On

You can always code up this:

extern bool some_program_halts();
extern int* invalid_pointer();

#include <iostream>
int main()
{
    using namespace std;
    if( some_program_halts() ) { cout << *invalid_pointer() << endl; }
}

Checking whether this program dereferences the invalid pointer is equivalent to finding out whether the call to some_program_halts(), uh, halts.

0
mpartel On

Static analysis is almost always an approximation, often provable by reduction to the halting problem with programs like the one in Alf's answer. However, the approximation can err on the side of either false positives or false negatives.

  • A "conservative" static check will only have false negatives. It will never accept a "bad" program, but it will inevitably reject some "sufficiently complicated" good programs.
  • A "liberal" static check will have false positives. Sometimes it accepts a bad program by mistake but (generally) it will also accept all good programs.

Some examples:

  • Java's type system is conservative: a variable with a type T will always contain an instance of type T (or a subtype of T or null) at runtime no matter what.
  • GCC's option to warn about uninitialized variables is liberal: it doesn't find all potential uses of an uninitialized variable. Here's an example of a false positive program.
  • In contrast, Java does a conservative uninitialized variable check for local variables. It refuses to compile the program if it sees any potential execution path using a potentially uninitialized variable.

Liberal checks are often used by compilers to emit warnings and by external static analysis tools. Things like type systems and compiler optimizations tend to rely on conservative checks to be correct.

Many tasks have several reasonable conservative and liberal algorithms of varying accuracy. Alias analysis is certainly one of these.

For more information, see any good compiler textbook, such as the dragon book.