Lily with Morphline and HBase

813 views Asked by At

I'm trying to use an tutorial from Cloudera. (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/search_hbase_batch_indexer.html)

I have a code to insert objects in Avro format in HBase and I want to insert them to Solr but I don't get anything.

I have been taking a look to the logs:

15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={0Name178721/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}
15/06/12 00:45:00 DEBUG indexer.Indexer$RowBasedIndexer: Indexer _default_ will send to Solr 0 adds and 0 deletes
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={1Name134339/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}

So, I'm reaing them but I don't know why it isn't indexed anything in Solr. I guess that my morphline.conf is wrong.

morphlines : [
{
    id : morphline1
    importCommands : ["org.kitesdk.**", "org.apache.solr.**", "com.ngdata.**"]
    commands : [
      {
         extractHBaseCells {
          mappings : [
            {
             inputColumn : "data:avroUser"
              outputField : "_attachment_body"
              type : "byte[]"
              source : value
            }
         ]
        }
      }

      #for avro use with type : "byte[]" in extractHBaseCells mapping above
      { readAvroContainer {} }
      {
        extractAvroPaths {
          flatten : true
          paths : {
            name : /name
          }
        }
      }
      { logTrace { format : "output record: {}", args : ["@{}"] } }
    ]
 }
]

I wasn't sure if I had to have an "_attachment_body" field in Solr, but it seems that it isn't necessary, so I guess that readAvroContainer or extractAvroPaths are wrong. I have a "name" field in Solr and my avroUser has a "name" field as well.

{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
1

There are 1 answers

0
Wemerson Cesar On

I have all this things working well here. I did this steps:

1) Install hbase-solr-indexer as a service: Fist of all you have to install hbase-solr-indexer. installing hbase-solr-indexing as a service

Add cloudera repos to yum repos for this. After that type:

sudo yum  install hbase-solr-indexer

2) Criate morphline files: ok, you did it.

2) Set the Replication scope for every column family and register a hbase-indexer configuration

Using the Lily HBase NRT Indexer Service

$ hbase shell
hbase shell> disable 'record'
hbase shell> alter 'record', {NAME => 'data', REPLICATION_SCOPE => 1}
hbase shell> enable 'record'

Try to follow the others tutorials above. ;) I was with problems with a NRT solution, but when I followed all that tutorial step by step It worked.

I hope this help someone.