I am currently working on a small project to read the entire details of artifacts on maven central. I came across this particular website https://maven.apache.org/repository/central-index.html It lists essentially a 3 step:
- downloading the index
- using index-cli project to unpack the index.gz
- and then using a lucene viewer step such as Luke to export the index as an xml
However I was going through the index-reader examples and unit tests (https://maven.apache.org/maven-indexer/indexer-reader/index.html) and it is very clear that simply using ChunkReader.splitIterator is sufficient to get all the details that we get after the 3 step output above.
Infact even the link suggests the same.
Verbatim - "Indexer Reader is a dependency-less library that is able to read published (remote) index with incremental update support, making usable to integrate published Maven Indexes into any engine without depending on maven-indexer-core and its transitive dependencies."
Question 1: The confusion and the question : why does this https://maven.apache.org/repository/central-index.html suggest a 3 step workflow to achieve the same.
Question 2: Is there any additional clarity somewhere which explains how and when the incrementals and the full upload happens? I found this blog post from 2009, just wanted to ensure it is still valid https://blog.sonatype.com/2009/05/nexus-indexer-20-incremental-downloading/
Regards