If I want to use distCp on an on-prem hadoop cluster, so it can 'push' data to external cloud storage, what firewall considerations must be made in order to leverage this tool? What ports does the actual transfer of data take place on? Is it via SSH, and/or port 8020? I need to make sure network connectivity is provided for source to destination, but with the least amount of privileges ascribed to it. (i.e., only opening ports that are absolutely needed)
Related Questions in HADOOP
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- Hadoop No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)
- Top-N using Python, MapReduce
- Spark Driver vs MapReduce Driver on YARN
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- can't write pyspark dataframe to parquet file on windows
- How to optimize writing to a large table in Hive/HDFS using Spark
- Can't replicate block xxx because the block file doesn't exist, or is not accessible
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- Hadoop MapReduce WordPairsCount produces inconsistent results
- If my data is not partitioned can that be why I’m getting maxResultSize error for my PySpark job?
- resource manager and nodemanager connectivity issues
- ERROR flume.SinkRunner: Unable to deliver event
- converting varchar(7) to decimal (7,5) in hive
Related Questions in NETWORKING
- How to avoid duplicates with the pull-based subscribe model?
- How to simulate CSMA/CD protocol in ns3?
- Network System - Cisco Packet Tracer
- Adhoc / mesh network not working (with and without batman-adv)
- Algorithm for finding a subset of nodes in a weighted connected graph such that the distance between any pair nodes are under a postive number?
- Python Client-Server Communication with Protocol
- I registered a service in eureka which is resolving through java code. But it is not able to resolve its name when hitting through chrome or postman
- Share files from the server without data or internet usage
- Player names not synchronizing in unity Mirror Networking
- My phone can not visit the server on macos in the same local network
- Unable to ping remote websites from an ipV6 only ubuntu ec2 Instance
- Linux Networking - Routing packets from one network interface to another
- wrong output from Supernetting algorithm
- Mapping localhost port on host to docker container
- Microsoft Message Analyzer disable resolving IP address to their domain names a.k.a turn off AutoIP feature
Related Questions in HDFS
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- How to optimize writing to a large table in Hive/HDFS using Spark
- Update hadoop hadoop-2.6.5 to haddop 3.x. Operation category WRITE is not supported in state standby
- Copy/Merge multiple HDFS files using Nifi Processor
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- ERROR flume.SinkRunner: Unable to deliver event
- Apache flume does not run hadoop 3.1.0 Flume 1.11
- Livy session to submit pyspark from HDFS
- ClickHouse Server Exception: Code: 210.DB::Exception: Fail to read from HDFS:
- Confluent HDFS Sink connector error while connecting HDFS to Hive
- Node Transitioned from NEW to UNHEALTHY and Attempting to remove non-existent node
- Error associated with Azure Datalake Gen2 and Hadoop connection
- How do I directly read files from HDFS using dask?
Related Questions in DISTCP
- distcp throws java.io.IOException when copying files
- How to copy data from an old Hive cluster to a new Hive cluster and keep all Hive metastore meta data?
- s3-dist-cp groupby Regex Capture
- Fastest way to copy large data from HDFS location to GCP bucket using command
- GCS Connector on EMR failing with java.lang.ClassNotFoundException
- How to specify a filter to exclude hdfs file of a partition when calling distcp -update?
- requested yarn user xxx not found when running distcp from hadoop to GCS
- Hadoop distcp does not skip CRC checks
- Hadoop distcp to S3
- distcp one table to another table with different name
- distcp - copy data from cloudera hdfs to cloud storage
- Hadoop distcp: what ports are used?
- DistCP - Even simple copies result in CRC Exceptions
- How should I move files/folders from one Hadoop cluster to another and delete the source contents after moving?
- One single distcp command to upload several files to s3 (NO DIRECTORY)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
I do not believe SSH is used for the actual data transfer, other than you actually logging into the cluster and starting the command, for example.
At a minimum, it would be the RPC data-transfer ports for the NameNodes and Datanodes, so whatever you've configured for
fs.defaultFS,dfs.namenode.rpc-addressanddfs.datanode.address