Filtering the Opengrok indexes based on an ELF file

516 views Asked by At

I have project based on linux based embedded application. Here i have an ELF file which i want to ensure that OpenGrok Indexing using only the symbols that are part of the ELF file excluding all non relevant/ non compiled portion of the project files. Is this possible with OpenGrok indexing ? If so what is the command to generate this index. Currently i used below command to generate the index for the entire source .. java
-Djava.util.logging.config.file=/opengrok/etc/logging.properties
-jar /opengrok/dist/lib/opengrok.jar
-c /usr/local/bin/ctags
-s /opengrok/src -d /opengrok/data -H -P -S -G
-W /opengrok/etc/configuration.xml -U http://localhost:8080/source

2

There are 2 answers

1
Adam Hornáček On

If you are looking to include/ignore only specific files (in this case ELF). You can use the following options:

-I (--include) - Only files matching this pattern will be examined. Pattern supports wildcards (example: -I '*.java' -I '*.c'). Option may be repeated.
  

-i (--ignore) - Ignore matching files (prefixed with 'f:' or no prefix) or directories (prefixed with 'd:'). Pattern supports wildcards (example: -i '*.so' -i d:'test*'). Option may be repeated.
2
Vlad On

It is not clear to me what exactly is meant by non relevant/ non compiled portion of the project files or non linked/compiled symbols so I will describe how ELF file analysis works in OpenGrok and you can decide whether this works for your use case or if filing a new issue is in order.

The ELF analyzer goes through the following ELF sections:

  • .debug_str
  • .comment
  • .data
  • .data1
  • .rodata
  • .rodata1

plus all sections with sh_type equal to SHT_STRTAB. The latter contain strings separated by null byte. From the content of these sections the analyzer extracts all printable strings (using non printable characters as separator) and concatenates them with the space character. So, all the printable strings from all these sections get accumulated into single string, effectively tokenized by the inserted spaces. These tokens are then stored in the index and therefore become searchable.

With this approach the index will contain not only symbols defined within the program but also names of external symbols referenced from the program (such as functions called from dynamic libraries), plus the contents of some of the global variables (if they contain printable strings).

Also, when the ELF binary is stripped, the .symtab section is removed and the symbols names defined within the program are lost to the indexer.

Now, it would be possible to traverse the ELF sections in more intelligent manner and exclude the external references (e.g. calls to dynamic library functions) however that would thwart the original idea which was to have a way to perform security vulnerability analysis - if it was known which function has a problem, it would be possible to perform a search for all binaries that call such function and therefore have some idea of security impact. Alternatively, the extracted tokens could be split into references and definitions.