I have a Solr server setup using the DataImportHandler2. Using my current settings, a full-import is taking 8-9 hours. I'd like to optimize settings to reduce that time, but the documentation isn't very clear about what various settings do and what side effects they have.
The server is a m2.2xlarge AWS instance (34.2 GB RAM). The Solr version is 3.6.1.2012.07.17.12.45.52. Solr running on Tomcat 7.0.30. Tomcat is running with -Xms4096m -Xmx28672m.
From solrconfig.xml, mergeFactor is 10, useCompoundFile is false. From data-config.xml, autoCommit is true, batchSize is -1. The query the DataImportHandler is using returns 6 million records.
Before even looking at mergeFactor et al, you should look at the entities in your db-data-config.xml. If you have entities inside other entities these will generate a lot of sql requests. You need to either work on your sql to not do inner entities or look at CachedSqlEntityProcessor etc