I am facing an issue when trying to scan an accumulo table. Short environment abstract:
Localhost single cluster setup of all involved components:
- accumulo 1.8.0
- zookeeper 3.4
- hadoop 2.7.3
OS:
- Ubuntu 16.05, 64 bit
Java:
- 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14
To the isssue.
First I created a user and then crete a table with that user as owner. I was able to insert data into the table using a Java client.
Later I wanted to check what I had inserted and for simplicity sake I chose the accumulo shell.
When I run the command scan -t <table>
, it returns immediately, giving me no results. Now the funny thing is, that the tablet status window (localhost:9995) shows that the table in questions has approx 110K entries.
The tablet server status screenshot:
Next I checked the size of the tablets in hdfs. The size implies to me that there is data:
1062429 2016-12-15 23:19 /accumulo/tables/c/default_tablet/A000001t.rf
Another table where I have the same issue has an even larger rf file (it has even more entries):
12433646 2016-12-15 22:23 /accumulo/tables/a/default_tablet/A000000i.rf
Next I turned on the debug mode in the shell:
debug on
Then I run the scan command again. The output:
scan
2016-12-16 00:01:38,113 [rpc.ThriftUtil] TRACE: Opening normal transport
2016-12-16 00:01:38,114 [impl.ThriftTransportPool] TRACE: Creating new connection to connection to localhost:9997
2016-12-16 00:01:38,131 [impl.ThriftTransportPool] TRACE: Returned connection localhost:9997 (120000) ioCount: 7130
2016-12-16 00:01:38,131 [admin.TableOperations] TRACE: tid=14 Checking if table tweets exists...
2016-12-16 00:01:38,132 [admin.TableOperations] TRACE: tid=14 Checked existance of true in 0.000 secs
2016-12-16 00:01:38,132 [impl.ThriftTransportPool] TRACE: Using existing connection to localhost:9997
2016-12-16 00:01:38,146 [impl.ThriftTransportPool] TRACE: Returned connection localhost:9997 (120000) ioCount: 7130
2016-12-16 00:01:38,147 [impl.ThriftTransportPool] TRACE: Using existing connection to localhost:9997
2016-12-16 00:01:38,158 [impl.ThriftTransportPool] TRACE: Returned connection localhost:9997 (120000) ioCount: 7130
2016-12-16 00:01:38,158 [admin.TableOperations] TRACE: tid=14 Checking if table tweets exists...
2016-12-16 00:01:38,159 [admin.TableOperations] TRACE: tid=14 Checked existance of true in 0.000 secs
2016-12-16 00:01:38,159 [impl.ThriftTransportPool] TRACE: Using existing connection to localhost:9997
2016-12-16 00:01:38,168 [impl.ThriftTransportPool] TRACE: Returned connection localhost:9997 (120000) ioCount: 7130
2016-12-16 00:01:38,168 [impl.ThriftTransportPool] TRACE: Using existing connection to localhost:9997
2016-12-16 00:01:38,170 [impl.ThriftTransportPool] TRACE: Returned connection localhost:9997 (120000) ioCount: 130
2016-12-16 00:01:38,170 [shell.Shell] DEBUG: Found no scan iterators to set
2016-12-16 00:01:38,177 [impl.TabletLocatorImpl] TRACE: tid=14 Locating tablet table=c row= skipRow=false retry=false
2016-12-16 00:01:38,178 [impl.TabletLocatorImpl] TRACE: tid=14 Located tablet c<< at localhost:9997 in 0.000 secs
2016-12-16 00:01:38,178 [impl.ThriftTransportPool] TRACE: Using existing connection to localhost:9997
2016-12-16 00:01:38,178 [impl.ThriftScanner] TRACE: tid=14 Starting scan tserver=localhost:9997 tablet=c<< range=(-inf,+inf) ssil=[] ssio={}
2016-12-16 00:01:38,374 [impl.ThriftScanner] TRACE: tid=14 Completely finished scan in 0.195 secs #results=0
2016-12-16 00:01:38,374 [impl.ThriftTransportPool] TRACE: Returned connection localhost:9997 (120000) ioCount: 262
2016-12-16 00:01:38,374 [admin.TableOperations] TRACE: tid=14 Fetching list of tables...
2016-12-16 00:01:38,374 [admin.TableOperations] TRACE: tid=14 Fetched 6 table names in 0.000 secs
2016-12-16 00:01:38,374 [impl.ThriftTransportPool] TRACE: Using existing connection to localhost:9997
2016-12-16 00:01:38,375 [impl.ThriftTransportPool] TRACE: Returned connection localhost:9997 (120000) ioCount: 154
2016-12-16 00:01:38,375 [admin.TableOperations] TRACE: tid=14 Fetching list of namespaces...
2016-12-16 00:01:38,375 [admin.TableOperations] TRACE: tid=14 Fetched 2 namespaces in 0.000 secs
`
To me the output looks good as to say the scan command finds the table and the tablet belonging to that table. But no result is shown.
Any insight on what I am doing wrong or missing would be appreciated.
It's quite likely that the records are hidden from the Accumulo user performing the scan. Even if they are the creator of the table or the Accumulo root the Accumulo user must have the appropriate security tags associated with each entry or they simply won't see anything.
If you have records of what you ingested or what code you used to ingest check to see if the security field was supplied. If it was then your user needs to be given the appropriate authorizations to read them using the the Accumulo shell.