Using OpenGrok to index Microsoft Office documents etc

309 views Asked by At

I was wondering if it is at all possible to use OpenGrok to index PPT, XLS, DOC etc formats. Would this have to be programmed by myself or is there already a plugin/method of doing this?

2

There are 2 answers

0
Vlad On

There is currently no dedicated analyzer to extract data from these types of documents, however it should be possible to implement one based on the Java libraries listed in Read Microsoft Word Documents into Plain Text (DOC, DOCX) in Java (e.g. Apache POI or Apache Tika)

Feel free to file a new issue on https://github.com/oracle/opengrok/issues

0
Richard Ludwig On

There is an issue on OpenGROK available here https://github.com/oracle/opengrok/issues/492 Though it is waiting since 2013 on a plugin interface.