How can I know which parts in the code are never used?

61.4k views Asked by At

I have legacy C++ code that I'm supposed to remove unused code from. The problem is that the code base is large.

How can I find out which code is never called/never used?

19

There are 19 answers

9
Matthieu M. On BEST ANSWER

There are two varieties of unused code:

  • the local one, that is, in some functions some paths or variables are unused (or used but in no meaningful way, like written but never read)
  • the global one: functions that are never called, global objects that are never accessed

For the first kind, a good compiler can help:

  • -Wunused (GCC, Clang) should warn about unused variables, Clang unused analyzer has even been incremented to warn about variables that are never read (even though used).
  • -Wunreachable-code (older GCC, removed in 2010) should warn about local blocks that are never accessed (it happens with early returns or conditions that always evaluate to true)
  • there is no option I know of to warn about unused catch blocks, because the compiler generally cannot prove that no exception will be thrown.

For the second kind, it's much more difficult. Statically it requires whole program analysis, and even though link time optimization may actually remove dead code, in practice the program has been so much transformed at the time it is performed that it is near impossible to convey meaningful information to the user.

There are therefore two approaches:

  • The theoretic one is to use a static analyzer. A piece of software that will examine the whole code at once in great detail and find all the flow paths. In practice I don't know any that would work here.
  • The pragmatic one is to use an heuristic: use a code coverage tool (in the GNU chain it's gcov. Note that specific flags should be passed during compilation for it to work properly). You run the code coverage tool with a good set of varied inputs (your unit-tests or non-regression tests), the dead code is necessarily within the unreached code... and so you can start from here.

If you are extremely interested in the subject, and have the time and inclination to actually work out a tool by yourself, I would suggest using the Clang libraries to build such a tool.

  1. Use the Clang library to get an AST (abstract syntax tree)
  2. Perform a mark-and-sweep analysis from the entry points onward

Because Clang will parse the code for you, and perform overload resolution, you won't have to deal with the C++ languages rules, and you'll be able to concentrate on the problem at hand.

However this kind of technique cannot identify the virtual overrides that are unused, since they could be called by third-party code you cannot reason about.

3
Simon Richter On

My normal approach to finding unused stuff is

  1. make sure the build system handles dependency tracking correctly
  2. set up a second monitor, with a full-screen terminal window, running repeated builds and showing the first screenful of output. watch "make 2>&1" tends to do the trick on Unix.
  3. run a find-and-replace operation on the entire source tree, adding "//? " at the beginning of every line
  4. fix the first error flagged by the compiler, by removing the "//?" in the corresponding lines.
  5. Repeat until there are no errors left.

This is a somewhat lengthy process, but it does give good results.

4
olsner On

For the case of unused whole functions (and unused global variables), GCC can actually do most of the work for you provided that you're using GCC and GNU ld.

When compiling the source, use -ffunction-sections and -fdata-sections, then when linking use -Wl,--gc-sections,--print-gc-sections. The linker will now list all the functions that could be removed because they were never called and all the globals that were never referenced.

(Of course, you can also skip the --print-gc-sections part and let the linker remove the functions silently, but keep them in the source.)

Note: this will only find unused complete functions, it won't do anything about dead code within functions. Functions called from dead code in live functions will also be kept around.

Some C++-specific features will also cause problems, in particular:

  • Virtual functions. Without knowing which subclasses exist and which are actually instantiated at run time, you can't know which virtual functions you need to exist in the final program. The linker doesn't have enough information about that so it will have to keep all of them around.
  • Globals with constructors, and their constructors. In general, the linker can't know that the constructor for a global doesn't have side effects, so it must run it. Obviously this means the global itself also needs to be kept.

In both cases, anything used by a virtual function or a global-variable constructor also has to be kept around.

An additional caveat is that if you're building a shared library, the default settings in GCC will export every function in the shared library, causing it to be "used" as far as the linker is concerned. To fix that you need to set the default to hiding symbols instead of exporting (using e.g. -fvisibility=hidden), and then explicitly select the exported functions that you need to export.

0
Adam Higuera On

If you are on Linux, you may want to look into callgrind, a C/C++ program analysis tool that is part of the valgrind suite, which also contains tools that check for memory leaks and other memory errors (which you should be using as well). It analyzes a running instance of your program, and produces data about its call graph, and about the performance costs of nodes on the call graph. It is usually used for performance analysis, but it also produces a call graph for your applications, so you can see what functions are called, as well as their callers.

This is obviously complementary to the static methods mentioned elsewhere on the page, and it will only be helpful for eliminating wholly unused classes, methods, and functions - it well not help find dead code inside methods which are actually called.

1
Justin Morgan On

The real answer here is: You can never really know for sure.

At least, for nontrivial cases, you can't be sure you've gotten all of it. Consider the following from Wikipedia's article on unreachable code:

double x = sqrt(2);
if (x > 5)
{
  doStuff();
}

As Wikipedia correctly notes, a clever compiler may be able to catch something like this. But consider a modification:

int y;
cin >> y;
double x = sqrt((double)y);

if (x != 0 && x < 1)
{
  doStuff();
}

Will the compiler catch this? Maybe. But to do that, it will need to do more than run sqrt against a constant scalar value. It will have to figure out that (double)y will always be an integer (easy), and then understand the mathematical range of sqrt for the set of integers (hard). A very sophisticated compiler might be able to do this for the sqrt function, or for every function in math.h, or for any fixed-input function whose domain it can figure out. This gets very, very complex, and the complexity is basically limitless. You can keep adding layers of sophistication to your compiler, but there will always be a way to sneak in some code that will be unreachable for any given set of inputs.

And then there are the input sets that simply never get entered. Input that would make no sense in real life, or get blocked by validation logic elsewhere. There's no way for the compiler to know about those.

The end result of this is that while the software tools others have mentioned are extremely useful, you're never going to know for sure that you caught everything unless you go through the code manually afterward. Even then, you'll never be certain that you didn't miss anything.

The only real solution, IMHO, is to be as vigilant as possible, use the automation at your disposal, refactor where you can, and constantly look for ways to improve your code. Of course, it's a good idea to do that anyway.

7
sharptooth On

One way is use a debugger and the compiler feature of eliminating unused machine code during compilation.

Once some machine code is eliminated the debugger won't let you put a breakpojnt on corresponding line of source code. So you put breakpoints everywhere and start the program and inspect the breakpoints - those which are in "no code loaded for this source" state correspond to eliminated code - either that code is never called or it has been inlined and you have to perform some minimum analysis to find which of those two happened.

At least that's how it works in Visual Studio and I guess other toolsets also can do that.

That's lots of work, but I guess faster than manually analyzing all code.

2
AUS On

It depends of the platform you use to create your application.

For example, if you use Visual Studio, you could use a tool like .NET ANTS Profiler which is able to parse and profile your code. This way, you should quickly know which part of your code is actually used. Eclipse also have equivalent plugins.

Otherwise, if you need to know what function of your application is actually used by your end user, and if you can release your application easily, you can use a log file for an audit.

For each main function, you can trace its usage, and after a few days/week just get that log file, and have a look at it.

13
Carlos On

I think you are looking for a code coverage tool. A code coverage tool will analyze your code as it is running, and it will let you know which lines of code were executed and how many times, as well as which ones were not.

You could try giving this open source code coverage tool a chance: TestCocoon - code coverage tool for C/C++ and C#.

3
UmmaGumma On

Well if you using g++ you can use this flag -Wunused

According documentation:

Warn whenever a variable is unused aside from its declaration, whenever a function is declared static but never defined, whenever a label is declared but not used, and whenever a statement computes a result that is explicitly not used.

http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html

Edit: Here is other useful flag -Wunreachable-code According documentation:

This option is intended to warn when the compiler detects that at least a whole line of source code will never be executed, because some condition is never satisfied or because it is after a procedure that never returns.

Update: I found similar topic Dead code detection in legacy C/C++ project

2
9dan On

I don't think it can be done automatically.

Even with code coverage tools, you need to provide sufficient input data to run.

May be very complex and high priced static analysis tool such as from Coverity's or LLVM compiler could be help.

But I'm not sure and I would prefer manual code review.

UPDATED

Well.. only removing unused variables, unused functions is not hard though.

UPDATED

After read other answers and comments, I'm more strongly convinced that it can't be done.

You have to know the code to have meaningful code coverage measure, and if you know that much manual editing will be faster than prepare/run/review coverage results.

2
finiteautomata On

I really haven't used any tool that does such a thing... But, as far as I've seen in all the answers, no one has ever said that this problem is uncomputable.

What do I mean by this? That this problem cannot be solved by any algorithm ever on a computer. This theorem (that such an algorithm doesn't exist) is a corollary of Turing's Halting Problem.

All the tools you will use are not algorithms but heuristics (i.e not exact algorithms). They will not give you exactly all the code that's not used.

2
Mr Shark On

I haven't used it myself, but cppcheck, claims to find unused functions. It probably won't solve the complete problem but it might be a start.

0
Tony On

You could try using PC-lint/FlexeLint from Gimple Software. It claims to

find unused macros, typedef's, classes, members, declarations, etc. across the entire project

I've used it for static analysis and found it very good but I have to admit that I have not used it to specifically find dead code.

2
ram singh On

Well if you using g++ you can use this flag -Wunused

According documentation:

Warn whenever a variable is unused aside from its declaration, whenever a function is declared static but never defined, whenever a label is declared but not used, and whenever a statement computes a result that is explicitly not used.

http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html

Edit: Here is other usefull flag -Wunreachable-code According documentation:

This option is intended to warn when the compiler detects that at least a whole line of source code will never be executed, because some condition is never satisfied or because it is after a procedure that never returns.
0
Lie Ryan On

Mark as much public functions and variables as private or protected without causing compilation error, while doing this, try to also refactor the code. By making functions private and to some extent protected, you reduced your search area since private functions can only be called from the same class (unless there are stupid macro or other tricks to circumvent access restriction, and if that's the case I'd recommend you find a new job). It is much easier to determine that you don't need a private function since only the class you're currently working on can call this function. This method is easier if your code base have small classes and is loosely coupled. If your code base does not have small classes or have very tight coupling, I suggest cleaning those up first.

Next will be to mark all the remaining public functions and make a call graph to figure out the relationship between the classes. From this tree, try to figure out which part of the branch looks like it can be trimmed.

The advantage of this method is that you can do it on per module basis, so it is easy to keep passing your unittest without having large period of time when you've got broken code base.

5
Luis Colorado On

The general problem of if some function will be called is undecidable. You cannot know in advance in a general way if some function will be called as you won't know if a Turing machine will ever stop. You can get if there's some path (statically) that goes from main() to the function you have written, but that doesn't warrant you it will ever be called. The set of decisions to decide if the function will be called is undecidable, if taken from a general form.

0
Kaz On

The GNU linker has a --cref option which produces cross-reference information. You can pass this from the gcc command line via -Wl,--cref.

For instance, suppose that foo.o defines a symbol foo_sym which is also used in bar.o. Then in the output you will see:

foo_sym                            foo.o
                                   bar.o

If foo_sym is confined to foo.o, then you won't see any additional object files; it will be followed by another symbol:

foo_sym                            foo.o
force_flag                         options.o

Now, from this we do not know that foo_sym is not used. It's just a candidate: we know that it's defined in one file, and not used in any others. foo_sym could be defined in foo.o and used there.

So, what you do with this information is

  1. Do some text munging to identify these symbols that are confined to one object file, producing a list of candidates.
  2. Go into the source code, and give each of the candidates internal linkage with static, like it should have.
  3. Recompile the source.
  4. Now, for any of those symbols which are really unused, the compiler will be able to warn, pinpointing them for you; you can delete those.

Of course, I'm ignoring the possibility that some of those symbols are unused on purpose, because they are exported for dynamic linkage (which can be the case even when an executable is linked); that's a more nuanced situation that you have to know about and intelligently deal with.

2
Steven Lu On

I had a friend ask me this very question today, and I looked around at some promising Clang developments, e.g. ASTMatchers and the Static Analyzer that might have sufficient visibility in the goings-on during compiling to determine the dead code sections, but then I found this:

https://blog.flameeyes.eu/2008/01/today-how-to-identify-unused-exported-functions-and-variables

It's pretty much a complete description of how to use a few GCC flags that are seemingly designed for the purpose of identifying unreferenced symbols!

1
Roman Boiko On

CppDepend is a commercial tool which can detect unused types, methods and fields, and do much more. It is available for Windows and Linux (but currently has no 64-bit support), and comes with a 2-week trial.

Disclaimer: I don't work there, but I own a license for this tool (as well as NDepend, which is a more powerful alternative for .NET code).

For those who are curious, here is an example built-in (customizable) rule for detecting dead methods, written in CQLinq:

// <Name>Potentially dead Methods</Name>
warnif count > 0
// Filter procedure for methods that should'nt be considered as dead
let canMethodBeConsideredAsDeadProc = new Func<IMethod, bool>(
    m => !m.IsPublic &&       // Public methods might be used by client applications of your Projects.
         !m.IsEntryPoint &&            // Main() method is not used by-design.
         !m.IsClassConstructor &&      
         !m.IsVirtual &&               // Only check for non virtual method that are not seen as used in IL.
         !(m.IsConstructor &&          // Don't take account of protected ctor that might be call by a derived ctors.
           m.IsProtected) &&
         !m.IsGeneratedByCompiler
)

// Get methods unused
let methodsUnused = 
   from m in JustMyCode.Methods where 
   m.NbMethodsCallingMe == 0 && 
   canMethodBeConsideredAsDeadProc(m)
   select m

// Dead methods = methods used only by unused methods (recursive)
let deadMethodsMetric = methodsUnused.FillIterative(
   methods => // Unique loop, just to let a chance to build the hashset.
              from o in new[] { new object() }
              // Use a hashet to make Intersect calls much faster!
              let hashset = methods.ToHashSet()
              from m in codeBase.Application.Methods.UsedByAny(methods).Except(methods)
              where canMethodBeConsideredAsDeadProc(m) &&
                    // Select methods called only by methods already considered as dead
                    hashset.Intersect(m.MethodsCallingMe).Count() == m.NbMethodsCallingMe
              select m)

from m in JustMyCode.Methods.Intersect(deadMethodsMetric.DefinitionDomain)
select new { m, m.MethodsCallingMe, depth = deadMethodsMetric[m] }