I have two machines. One machine runs HBase 0.92.2 in pseudo-distributed mode, while the other one is using Nutch 2.x crawler. How can I configure these two machines so that one machine with HBase-0.92.2 acts as back end storage and the other with Nutch-2.x acts as a crawler?
How can I connect apache Nutch 2.x to a remote HBase cluster?
698 views Asked by zahid adeel At
1
There are 1 answers
Related Questions in HADOOP
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- Hadoop No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)
- Top-N using Python, MapReduce
- Spark Driver vs MapReduce Driver on YARN
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- can't write pyspark dataframe to parquet file on windows
- How to optimize writing to a large table in Hive/HDFS using Spark
- Can't replicate block xxx because the block file doesn't exist, or is not accessible
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- Hadoop MapReduce WordPairsCount produces inconsistent results
- If my data is not partitioned can that be why I’m getting maxResultSize error for my PySpark job?
- resource manager and nodemanager connectivity issues
- ERROR flume.SinkRunner: Unable to deliver event
- converting varchar(7) to decimal (7,5) in hive
Related Questions in HBASE
- Apache atlas UI not showing up
- HBase Zookeeper Connection Error Docker Standalone 2.3.x and 2.4.x
- How does bulkload in databases such as hbase/cassandra/KV store work?
- How to eradicate the slowness caused due to reading rows from bigtable with hbase client in google dataflow job?
- i cant delete the specific column data by Timestamp
- hbase shell QualifierFilter is not filtering out columns when used with logical OR and SingleColumnValueFilter
- Spark - Fetch Hbase table all versions data using HBase Spark connector
- Unable to recover inconsistency in Hbase
- hBase java api, error on bulkload Added a key not lexically larger than previous sort (with JavaPairRDD<ImmutableBytesWritable, KeyValue>)
- Functionality inside completable future is completing quickly but completable future and timelimiter are taking too long
- about hbase put attribute
- java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/Table
- Big Table Java Connectivity issue
- How to check if the Thrift is working on HBase version 2.5 and How to indicate if Thrift 1 or Thrift 2 is installed?
- HMaster stuck at "Initialize ServerManager and schedule SCP for crash servers"
Related Questions in APACHE-ZOOKEEPER
- Changing kafka zookeeper.connect by adding chroot
- Suppress Log4j Output from org.apache.zookeeper.ZooKeeper
- Using Kazoo to interact with a ZK cluster
- Zookeeper timeout when upgrade flink 1.14 to 1.18
- Docker-compose Kafka: no brokers available
- Why I'm getting this error when implementing SSL security in zookeeper(kafka) and connecting using zookeeper-shell.sh - PKIX path building failed?
- Keeper Clickhouse Replication DDL on cluster, but no replication data, error "Table was in readonly mode"
- zkcli upconfig by using java service
- Error while running the zookeeper command on windows machine
- HBase Zookeeper Connection Error Docker Standalone 2.3.x and 2.4.x
- can't to start clickhouse service after restart
- The system cannot find the path specified. Unable to start Zookeeper
- Zookeeper integration with .Net c# getting error while fetching node
- log4j properties doesn't apply after upgrading zookeeper from 3.6.3 to 3.9.1
- kafka controllers + root cause of re-elect in worse case scenario
Related Questions in NUTCH
- Apache Nutch - How to store crawl data under the folder with the page name/url
- Nutch 1.19 / Solr 9.4.0 How to point Nutch to the Solr instance?
- nutch error: Illegal to have multiple roots (start tag in epilog?)
- What is the correct format for a solrcloud url in Nutch's index-writers.xml config?
- How can I fix the Bad Gateway error when adding Solr as a data source to Grafana?
- Apache Nutch 1.19 Getting Error: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
- Running apache nutch in local machine
- Nutch 1.19 Webgraph command error: OutlinkDb job did not succeed, job id: job_local306968781_0001, job status: FAILED, reason: NA
- Nutch 2.x response content : doesn't work properly without JavaScript enabled. Please enable it to continue
- Using Java & Apache Nutch to scrape dynamic elements from a website
- Building Apache Nutch Docker container
- Nutch additional fields for indexing in solr
- after fresh installation of nutch and solr crawl error
- Updating Max Depth for Apache-Nutch Crawler in scoring-depth filter is not working
- Search for solve a error 255 in SOLR Nutch
Related Questions in NUTCH2
- Updating Max Depth for Apache-Nutch Crawler in scoring-depth filter is not working
- Apache Nutch is crawling few domain more and other less with default configuration
- Apache Nutch not reading a new configuration file when run with job file
- I had some questions on db_redir_temp
- Nutch http.redirect.max may I know what does it Mean
- org.apache.tika.utils.XMLReaderUtils acquireSAXParser WARNING: Contention waiting for a SAXParser. Consider increasing the XMLReaderUtils.POOL_SIZE
- nutch fetch failed with protocol status: exception(16), lastModified=0: Http code=403, url=https://www.nicobuyscars.com
- Nutch 1.17 web crawling with storage optimization
- Restrict Nutch to Seed path and its following webpages only
- Nutch - Visit few pages again and again to find new links
- Apache Nutch index only article pages to Solr
- Errors using curl for nutch RESTapi calls
- Apache Nutch skipping URLs & truncating
- Apache Nutch 2.3.1, increase reducer memory
- Configuring RAM in Nutch
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
I finally did it.I was easy to do. i am sharing my experience here. May be it can help someone.
1- change the configuration file of hbase-site.xml for pseudo distributed mode.
2- MOST IMPORTANT THING: on hbase machine, replace localhost ip in /etc/hosts with your real network ip like this
10.11.22.189 master localhost
hbase machine's ip = 10.11.22.189 (note: if you won't change your hbase machine's localhost ip, remote nutch crawler won't be able to connect to it)
4- copy/symlink hbase-site.xml into $NUTCH_HOME/conf
5- start your crawler and see it working