Can't fetch any rows using SOLR DataImportHandler from DB

4.8k views Asked by At

Have a simple test database setup for my first attempt at indexing DB data using the SOLR DIH. Unfortunately, I get the following result from full import each time:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">15</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</lst>
<str name="command">full-import</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Time Elapsed">0:0:2.187</str>
<str name="Total Requests made to DataSource">1</str>
<str name="Total Rows Fetched">0</str>
<str name="Total Documents Processed">0</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2011-03-06 21:30:07</str>
<str name="">Indexing failed. Rolled back all changes.</str>
<str name="Rolledback">2011-03-06 21:30:07</str>
</lst>
<str name="WARNING">
This response format is experimental.  It is likely to change in the future.
</str>
</response>

my solrconfig.xml has the following requestHandler defined:

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
  <str name="config">data-config.xml</str>
</lst>
</requestHandler>

my data-config.xml contains the following:

<dataConfig>
<dataSource type="JdbcDataSource" 
          driver="com.mysql.jdbc.Driver"
          url="jdbc:mysql://localhost:8889/Dev1" 
          user="root" 
          password="***"/>
<document>
<entity name="business_profile" 
        query="select business_id,business_name,address1,address2,city,state,zip from business_profile">
</entity>
</document>
</dataConfig>

schema.xml field definition:

<field name="business_id" type="int" indexed="true" stored="true"/>
<field name="business_name" type="string" indexed="true" stored="true"/>
<field name="address1" type="string" indexed="true" stored="true"/>
<field name="address2" type="string" indexed="true" stored="true"/>
<field name="city" type="string" indexed="true" stored="true"/>
<field name="state" type="string" indexed="true" stored="true"/>
<field name="zip" type="string" indexed="true" stored="true"/>

If the total requests to datasource displays '1', does that mean the JDBC driver is configured properly and it's a possible MySQL permissions issue at that point or not necessarily?

I've played with the DIH dev console trying to get something back, but it's always 1 request and 0 rows which makes me think it's JDBC or the mySQL permissions. I've ruled out the DB name, port #, & user/pass, but perhaps the JDBC driver is setup incorrectly..?

Thanks


Here's what I'm getting from the solr log

0:0:0:0:0:0:0:1%0 -  -  [07/03/2011:17:50:41 +0000] "GET /solr/dataimport?command=full-import&mode=debug HTTP/1.1" 200 853 
0:0:0:0:0:0:0:1%0 -  -  [07/03/2011:17:50:47 +0000] "GET /solr/dataimport?command=full-import&mode=debug HTTP/1.1" 200 851 
0:0:0:0:0:0:0:1%0 -  -  [07/03/2011:17:51:03 +0000] "GET /solr/dataimport?command=full-import&mode=debug HTTP/1.1" 200 853 
0:0:0:0:0:0:0:1%0 -  -  [07/03/2011:17:51:13 +0000] "GET /solr/dataimport?command=full-import&mode=debug HTTP/1.1" 200 852 
0:0:0:0:0:0:0:1%0 -  -  [07/03/2011:17:51:14 +0000] "GET /solr/dataimport?command=full-import&mode=debug HTTP/1.1" 200 852 
7

There are 7 answers

1
nikhil500 On

Please run the query from data-config.xml directly in MySQL and check if any rows are returned.

Also, please check the error logs of your server (catalina.out if you are using Tomcat) and post any error messages logged there.

0
Md Jawed Shamshedi On

The data-config.xml that you post here, in the url part url="jdbc:mysql://localhost:8889/Dev1" . pls change it to url="jdbc:mysql://localhost/Dev1" and have you put the jdbc driver in the lib directory. please check the jdbc driver where it is in the lib directory or not...

3
Marko Bonaci On

You, of course, added fields business_id, business_name, address1, address2, city, state, zip in your schema.xml?

Have you been able to find any errors in server's log?

I'm not really sure whether you need to name your data source if you have only one, but that's the way to tell the entity which data source to use:

<dataConfig>

    <dataSource 
        name="dev" 
        driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
        encoding="UTF-8"
        url="jdbc:sqlserver://____:1433;databaseName=____"
        user="____" 
        password="____" 
        readOnly="true" 
        autoCommit="false" />


    <document>
        <entity 
            name="metadataObject" 
            dataSource="dev" 
            pk="ITEM_MOID" 
            transformer="RegexTransformer,DateFormatTransformer"        
            query="___"

            ...

    </document>
</dataConfig>
1
Prasanna On

Look at the SOLR logs, it would print the stacktrace whenever indexing fails.

0
gb96 On

This is a long-shot, but based on your Solr logs it appears your server has ipv6 enabled.

Some applications don't support ipv6 by default, you might want to check your MySQL configuration to ensure it permits connections via ipv6 on the loopback interface.

See this MySQL version 5.5 guide for example: http://dev.mysql.com/doc/refman/5.5/en/ipv6-local-connections.html

In particular steps 3 and 4 in the above guide can be used to test connectivity, e.g.

shell> mysql -h localhost -u root -pYourRootPassword

mysql> STATUS
0
disjoint_molecule On

I had a similar problem. If your schema.xml (or managed-schema) contains id or unique id like below Ex: <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

Then you need to select this filed in your data-config.xml as your primary key or set new key with required="true", or remove this part.

0
Khaled On

Are you connecting to remote db using ssh or tunnel? seems to me that you're connecting to remote db. try to connect to local mysql server, if it worked, then you have problem connecting using SSH.