I have legacy C++ code that I'm supposed to remove unused code from. The problem is that the code base is large.
How can I find out which code is never called/never used?
I have legacy C++ code that I'm supposed to remove unused code from. The problem is that the code base is large.
How can I find out which code is never called/never used?
My normal approach to finding unused stuff is
watch "make 2>&1"
tends to do the trick on Unix.This is a somewhat lengthy process, but it does give good results.
For the case of unused whole functions (and unused global variables), GCC can actually do most of the work for you provided that you're using GCC and GNU ld.
When compiling the source, use -ffunction-sections
and -fdata-sections
, then when linking use -Wl,--gc-sections,--print-gc-sections
. The linker will now list all the functions that could be removed because they were never called and all the globals that were never referenced.
(Of course, you can also skip the --print-gc-sections
part and let the linker remove the functions silently, but keep them in the source.)
Note: this will only find unused complete functions, it won't do anything about dead code within functions. Functions called from dead code in live functions will also be kept around.
Some C++-specific features will also cause problems, in particular:
In both cases, anything used by a virtual function or a global-variable constructor also has to be kept around.
An additional caveat is that if you're building a shared library, the default settings in GCC will export every function in the shared library, causing it to be "used" as far as the linker is concerned. To fix that you need to set the default to hiding symbols instead of exporting (using e.g. -fvisibility=hidden
), and then explicitly select the exported functions that you need to export.
If you are on Linux, you may want to look into callgrind
, a C/C++ program analysis tool that is part of the valgrind
suite, which also contains tools that check for memory leaks and other memory errors (which you should be using as well). It analyzes a running instance of your program, and produces data about its call graph, and about the performance costs of nodes on the call graph. It is usually used for performance analysis, but it also produces a call graph for your applications, so you can see what functions are called, as well as their callers.
This is obviously complementary to the static methods mentioned elsewhere on the page, and it will only be helpful for eliminating wholly unused classes, methods, and functions - it well not help find dead code inside methods which are actually called.
The real answer here is: You can never really know for sure.
At least, for nontrivial cases, you can't be sure you've gotten all of it. Consider the following from Wikipedia's article on unreachable code:
double x = sqrt(2);
if (x > 5)
{
doStuff();
}
As Wikipedia correctly notes, a clever compiler may be able to catch something like this. But consider a modification:
int y;
cin >> y;
double x = sqrt((double)y);
if (x != 0 && x < 1)
{
doStuff();
}
Will the compiler catch this? Maybe. But to do that, it will need to do more than run sqrt
against a constant scalar value. It will have to figure out that (double)y
will always be an integer (easy), and then understand the mathematical range of sqrt
for the set of integers (hard). A very sophisticated compiler might be able to do this for the sqrt
function, or for every function in math.h, or for any fixed-input function whose domain it can figure out. This gets very, very complex, and the complexity is basically limitless. You can keep adding layers of sophistication to your compiler, but there will always be a way to sneak in some code that will be unreachable for any given set of inputs.
And then there are the input sets that simply never get entered. Input that would make no sense in real life, or get blocked by validation logic elsewhere. There's no way for the compiler to know about those.
The end result of this is that while the software tools others have mentioned are extremely useful, you're never going to know for sure that you caught everything unless you go through the code manually afterward. Even then, you'll never be certain that you didn't miss anything.
The only real solution, IMHO, is to be as vigilant as possible, use the automation at your disposal, refactor where you can, and constantly look for ways to improve your code. Of course, it's a good idea to do that anyway.
One way is use a debugger and the compiler feature of eliminating unused machine code during compilation.
Once some machine code is eliminated the debugger won't let you put a breakpojnt on corresponding line of source code. So you put breakpoints everywhere and start the program and inspect the breakpoints - those which are in "no code loaded for this source" state correspond to eliminated code - either that code is never called or it has been inlined and you have to perform some minimum analysis to find which of those two happened.
At least that's how it works in Visual Studio and I guess other toolsets also can do that.
That's lots of work, but I guess faster than manually analyzing all code.
It depends of the platform you use to create your application.
For example, if you use Visual Studio, you could use a tool like .NET ANTS Profiler which is able to parse and profile your code. This way, you should quickly know which part of your code is actually used. Eclipse also have equivalent plugins.
Otherwise, if you need to know what function of your application is actually used by your end user, and if you can release your application easily, you can use a log file for an audit.
For each main function, you can trace its usage, and after a few days/week just get that log file, and have a look at it.
I think you are looking for a code coverage tool. A code coverage tool will analyze your code as it is running, and it will let you know which lines of code were executed and how many times, as well as which ones were not.
You could try giving this open source code coverage tool a chance: TestCocoon - code coverage tool for C/C++ and C#.
Well if you using g++ you can use this flag -Wunused
According documentation:
Warn whenever a variable is unused aside from its declaration, whenever a function is declared static but never defined, whenever a label is declared but not used, and whenever a statement computes a result that is explicitly not used.
http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
Edit: Here is other useful flag -Wunreachable-code
According documentation:
This option is intended to warn when the compiler detects that at least a whole line of source code will never be executed, because some condition is never satisfied or because it is after a procedure that never returns.
Update: I found similar topic Dead code detection in legacy C/C++ project
I don't think it can be done automatically.
Even with code coverage tools, you need to provide sufficient input data to run.
May be very complex and high priced static analysis tool such as from Coverity's or LLVM compiler could be help.
But I'm not sure and I would prefer manual code review.
UPDATED
Well.. only removing unused variables, unused functions is not hard though.
UPDATED
After read other answers and comments, I'm more strongly convinced that it can't be done.
You have to know the code to have meaningful code coverage measure, and if you know that much manual editing will be faster than prepare/run/review coverage results.
I really haven't used any tool that does such a thing... But, as far as I've seen in all the answers, no one has ever said that this problem is uncomputable.
What do I mean by this? That this problem cannot be solved by any algorithm ever on a computer. This theorem (that such an algorithm doesn't exist) is a corollary of Turing's Halting Problem.
All the tools you will use are not algorithms but heuristics (i.e not exact algorithms). They will not give you exactly all the code that's not used.
I haven't used it myself, but cppcheck, claims to find unused functions. It probably won't solve the complete problem but it might be a start.
You could try using PC-lint/FlexeLint from Gimple Software. It claims to
find unused macros, typedef's, classes, members, declarations, etc. across the entire project
I've used it for static analysis and found it very good but I have to admit that I have not used it to specifically find dead code.
Well if you using g++ you can use this flag -Wunused
According documentation:
Warn whenever a variable is unused aside from its declaration, whenever a function is declared static but never defined, whenever a label is declared but not used, and whenever a statement computes a result that is explicitly not used.
http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
Edit: Here is other usefull flag -Wunreachable-code According documentation:
This option is intended to warn when the compiler detects that at least a whole line of source code will never be executed, because some condition is never satisfied or because it is after a procedure that never returns.
Mark as much public functions and variables as private or protected without causing compilation error, while doing this, try to also refactor the code. By making functions private and to some extent protected, you reduced your search area since private functions can only be called from the same class (unless there are stupid macro or other tricks to circumvent access restriction, and if that's the case I'd recommend you find a new job). It is much easier to determine that you don't need a private function since only the class you're currently working on can call this function. This method is easier if your code base have small classes and is loosely coupled. If your code base does not have small classes or have very tight coupling, I suggest cleaning those up first.
Next will be to mark all the remaining public functions and make a call graph to figure out the relationship between the classes. From this tree, try to figure out which part of the branch looks like it can be trimmed.
The advantage of this method is that you can do it on per module basis, so it is easy to keep passing your unittest without having large period of time when you've got broken code base.
The general problem of if some function will be called is undecidable. You cannot know in advance in a general way if some function will be called as you won't know if a Turing machine will ever stop. You can get if there's some path (statically) that goes from main() to the function you have written, but that doesn't warrant you it will ever be called. The set of decisions to decide if the function will be called is undecidable, if taken from a general form.
The GNU linker has a --cref
option which produces cross-reference information. You can pass this from the gcc
command line via -Wl,--cref
.
For instance, suppose that foo.o
defines a symbol foo_sym
which is also used in bar.o
. Then in the output you will see:
foo_sym foo.o
bar.o
If foo_sym
is confined to foo.o
, then you won't see any additional object files; it will be followed by another symbol:
foo_sym foo.o
force_flag options.o
Now, from this we do not know that foo_sym
is not used. It's just a candidate: we know that it's defined in one file, and not used in any others. foo_sym
could be defined in foo.o
and used there.
So, what you do with this information is
static
, like it should have.Of course, I'm ignoring the possibility that some of those symbols are unused on purpose, because they are exported for dynamic linkage (which can be the case even when an executable is linked); that's a more nuanced situation that you have to know about and intelligently deal with.
I had a friend ask me this very question today, and I looked around at some promising Clang developments, e.g. ASTMatchers and the Static Analyzer that might have sufficient visibility in the goings-on during compiling to determine the dead code sections, but then I found this:
https://blog.flameeyes.eu/2008/01/today-how-to-identify-unused-exported-functions-and-variables
It's pretty much a complete description of how to use a few GCC flags that are seemingly designed for the purpose of identifying unreferenced symbols!
CppDepend is a commercial tool which can detect unused types, methods and fields, and do much more. It is available for Windows and Linux (but currently has no 64-bit support), and comes with a 2-week trial.
Disclaimer: I don't work there, but I own a license for this tool (as well as NDepend, which is a more powerful alternative for .NET code).
For those who are curious, here is an example built-in (customizable) rule for detecting dead methods, written in CQLinq:
// <Name>Potentially dead Methods</Name>
warnif count > 0
// Filter procedure for methods that should'nt be considered as dead
let canMethodBeConsideredAsDeadProc = new Func<IMethod, bool>(
m => !m.IsPublic && // Public methods might be used by client applications of your Projects.
!m.IsEntryPoint && // Main() method is not used by-design.
!m.IsClassConstructor &&
!m.IsVirtual && // Only check for non virtual method that are not seen as used in IL.
!(m.IsConstructor && // Don't take account of protected ctor that might be call by a derived ctors.
m.IsProtected) &&
!m.IsGeneratedByCompiler
)
// Get methods unused
let methodsUnused =
from m in JustMyCode.Methods where
m.NbMethodsCallingMe == 0 &&
canMethodBeConsideredAsDeadProc(m)
select m
// Dead methods = methods used only by unused methods (recursive)
let deadMethodsMetric = methodsUnused.FillIterative(
methods => // Unique loop, just to let a chance to build the hashset.
from o in new[] { new object() }
// Use a hashet to make Intersect calls much faster!
let hashset = methods.ToHashSet()
from m in codeBase.Application.Methods.UsedByAny(methods).Except(methods)
where canMethodBeConsideredAsDeadProc(m) &&
// Select methods called only by methods already considered as dead
hashset.Intersect(m.MethodsCallingMe).Count() == m.NbMethodsCallingMe
select m)
from m in JustMyCode.Methods.Intersect(deadMethodsMetric.DefinitionDomain)
select new { m, m.MethodsCallingMe, depth = deadMethodsMetric[m] }
There are two varieties of unused code:
For the first kind, a good compiler can help:
-Wunused
(GCC, Clang) should warn about unused variables, Clang unused analyzer has even been incremented to warn about variables that are never read (even though used).-Wunreachable-code
(older GCC, removed in 2010) should warn about local blocks that are never accessed (it happens with early returns or conditions that always evaluate to true)catch
blocks, because the compiler generally cannot prove that no exception will be thrown.For the second kind, it's much more difficult. Statically it requires whole program analysis, and even though link time optimization may actually remove dead code, in practice the program has been so much transformed at the time it is performed that it is near impossible to convey meaningful information to the user.
There are therefore two approaches:
gcov
. Note that specific flags should be passed during compilation for it to work properly). You run the code coverage tool with a good set of varied inputs (your unit-tests or non-regression tests), the dead code is necessarily within the unreached code... and so you can start from here.If you are extremely interested in the subject, and have the time and inclination to actually work out a tool by yourself, I would suggest using the Clang libraries to build such a tool.
Because Clang will parse the code for you, and perform overload resolution, you won't have to deal with the C++ languages rules, and you'll be able to concentrate on the problem at hand.
However this kind of technique cannot identify the virtual overrides that are unused, since they could be called by third-party code you cannot reason about.