I have been given a medium-sized but complex C project (about 200,000 lines in total) which contains around 100 .h files and nearly as many .c files.
Many of the .h files correspond to equivalent .c files, but there is one .h file in particular, let's call it project_common.h, that #includes many of the other .h files as well as containing about 2000 lines of mostly struct and enum definitions etc. Many of the structs are heavily nested so that their order very much matters.
The structure of the file is roughly:
#include guard
#include <assert.h>
#include <stdint.h>
/* etc */
#include "project_aaa.h"
#include "project_bbb.h"
/* Then another 30 or so lines like this. These are in alphabetical
order and a certain amount of effort has been made that they can be
included in any order. */
/* Then about 2000 lines of struct, enums, function definitions etc. */
I have been tasked with moving most or all of the 2000 lines and either creating new .h files for them or pasting them into one of the existing .h files. One rule is that each header must be able to be included independently of all others. In other words, each header must not need other headers to be included before it. After some effort, I've quickly realised that, even as a senior software engineer of 25+ years' experience, this is not an easy task at all because of the very complex and hierarchical nature of the struct definitions.
My problems in particular are:
- It's bad practice to include all those headers in project_common.h, and defeats the point of splitting it all up.
- It's really REALLY hard to split all this up in such a way that the resulting headers can be included from a given C file in any combination.
So what I am asking is, are there any tools out there that can help with refactoring all the .h files into a more optimal configuration, and/or is there a recognised method that's better than trial and error?
So far I've tried moving struct definitions around, but progress is very slow and tedious, and although I have nearly halved the size of project_common.h, the new headers I have created only work if they are included in the right order.
I don't know about refactoring tools for this purpose, but as a salient matter of code style, I hold that every header and regular source file, X, should itself
#includeevery other header that directly declares any function or variable that X defines or directly references, and every header defining a macro that X directly uses, and only those. That applies to#includeing system and third-party headers just as much as to your project's internal headers.You may have already come a long way in that direction in support of the goal of making it possible to
#includeany header individually. However, it seems clear that you cannot be fully adhering to that principle when you sayIf each header
#includes all the other headers providing declarations that it needs itself, and also provides effective guards against multiple inclusion, then the only way to have#include-order problems is if you have a dependency cycle. If you did not already have a cycle when you started then there is no particular reason why your refactoring should produce one. If your refactoring does produce one then that implies that some or all of the contents of the headers in that cycle should be merged into the same header.Also, and this may be obvious, in choosing where to move the existing declarations, I would recommend focusing on semantic relationships rather than on simple code dependency relationships. Things that are usually used together are a likely choice for cohabitation in the same header, but not so much things that just happen to be in some of the same dependency chains.
Now, I suppose it's possible that when you say ...
... you mean not just that one can pick and choose headers to include without concern about order and dependencies, but also that no header is permitted to include any of the others. If so, then that's an artificial and difficult to sustain provision. It implies, for instance, that wherever you have a structure or union type that embeds (not just points to) an object of one of the project's other internal types, those two types must be declared in the same header. If you happen to be saddled with something like that, then your current task provides a good context for pushing back against it.
Finally, as a practical matter, I would start at the top of the file and work downward from there. This way you will work first with the declarations that have the fewest dependencies. You may even find it useful to think of this and work on it as a series of many small refactorings instead of one huge one.