I am using hortonworks sandbox in Azure with spark 1.6. I have a Hive database populated with TPC-DS sample data. I want to read some SQL queries from external files and run them on the hive dataset in spark. I follow this topic Using hive database in spark which is just using a table in my dataset and also it writes SQL query in spark again, but I need to define whole, dataset as my source to query on that, I think i should use dataframes but i am not sure and do not know how! also I want to import the SQL query from external .sql file and do not write down the query again! would you please guide me how can I do this? thank you very much, bests!
how to use a whole hive database in spark and read sql queries from external files?
5.8k views Asked by Fardin Behboudi At
1
There are 1 answers
Related Questions in APACHE-SPARK
- Issues in Migration of RISCV Test Harness from VCS to Questasim Simulator
- Queue Scenario Help Getting Started
- Writing a simulation program in Python
- Java Card applet EEPROM vs RAM testing
- Simulate the use of a website with a client
- Verilog simulation x's in output
- Time step independence of Molecular Dynamics code
- How to code a arrival generator with a varying intensity rate
- Is it possible to build a heatmap from point data at 60 times per second?
- Verilog Testbench constant exp and pram compilation and simulation errors
Related Questions in HIVE
- Issues in Migration of RISCV Test Harness from VCS to Questasim Simulator
- Queue Scenario Help Getting Started
- Writing a simulation program in Python
- Java Card applet EEPROM vs RAM testing
- Simulate the use of a website with a client
- Verilog simulation x's in output
- Time step independence of Molecular Dynamics code
- How to code a arrival generator with a varying intensity rate
- Is it possible to build a heatmap from point data at 60 times per second?
- Verilog Testbench constant exp and pram compilation and simulation errors
Related Questions in APACHE-SPARK-SQL
- Issues in Migration of RISCV Test Harness from VCS to Questasim Simulator
- Queue Scenario Help Getting Started
- Writing a simulation program in Python
- Java Card applet EEPROM vs RAM testing
- Simulate the use of a website with a client
- Verilog simulation x's in output
- Time step independence of Molecular Dynamics code
- How to code a arrival generator with a varying intensity rate
- Is it possible to build a heatmap from point data at 60 times per second?
- Verilog Testbench constant exp and pram compilation and simulation errors
Related Questions in HADOOP2
- Issues in Migration of RISCV Test Harness from VCS to Questasim Simulator
- Queue Scenario Help Getting Started
- Writing a simulation program in Python
- Java Card applet EEPROM vs RAM testing
- Simulate the use of a website with a client
- Verilog simulation x's in output
- Time step independence of Molecular Dynamics code
- How to code a arrival generator with a varying intensity rate
- Is it possible to build a heatmap from point data at 60 times per second?
- Verilog Testbench constant exp and pram compilation and simulation errors
Related Questions in TPC
- Issues in Migration of RISCV Test Harness from VCS to Questasim Simulator
- Queue Scenario Help Getting Started
- Writing a simulation program in Python
- Java Card applet EEPROM vs RAM testing
- Simulate the use of a website with a client
- Verilog simulation x's in output
- Time step independence of Molecular Dynamics code
- How to code a arrival generator with a varying intensity rate
- Is it possible to build a heatmap from point data at 60 times per second?
- Verilog Testbench constant exp and pram compilation and simulation errors
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Spark Can read data directly from Hive table. You can create, drop Hive table using Spark and even you can do all Hive hql related operations through the Spark. For this you need to use Spark
HiveContext
From the Spark documentation:
Spark HiveContext, provides a superset of the functionality provided by the basic SQLContext. Additional features include the ability to write queries using the more complete HiveQL parser, access to Hive UDFs, and the ability to read data from Hive tables. To use a HiveContext, you do not need to have an existing Hive setup.
For more information you can visit Spark Documentation
To Avoid writing sql in code, you can use property file where you can put all your Hive query and then you can use the key in you code.
Please see below the implementation of Spark HiveContext and use of property file in Spark Scala.
Entry in Properties File :
Spark submit Command to run this job:
Note: Property File location should be HDFS location.