SessionException occurs when crawling with solrCloud

7.7k views Asked by At

I using solrCloud 6.1.0. I trying to crawl with manifoldcf2.4. But it does not work.

The following is the execution environment. java:1.8(However, it is 1.7 when installing manifoldcf) zookeeper:3.4.9

If i start job with manifoldcf, I can crawl the first few items. However, after a while, a connection error of zookeeper occurs, and a part of the node configured by solrCloud falls.

Below is the error log of zookeeper.

ERROR org.apache.solr.servlet.SolrDispatchFilter
null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:252) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:249) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:249) at org.apache.solr.common.cloud.ZkStateReader.updateAliases(ZkStateReader.java:556) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:169) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.ProxyUserFilter.doFilter(ProxyUserFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:140) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384) at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:145) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.HostnameFilter.doFilter(HostnameFilter.java:86) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) ERROR org.apache.solr.servlet.SolrDispatchFilter
null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)

I do not know why zookeeper gets cut off in the middle of crawling.

Someone, please teach. thanks.

1

There are 1 answers

0
AR1 On

Your problem is the session expiring as the error itself says:

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json

This likely indicates that your client is pausing for long periods of time, maybe crawling a long document or getting data transferred over the net, enough for the session to expire. You could try extending the timeout period as explained here, but there's a chance that it would just extend the crawl period a little bit without solving your issue. Please refer to the zookeeper troubleshooting guide and/or this interesting post for a full resolution.