I am importing data from MySQL table (for selected columns only) and putting it in HDFS. Once this is done, I want to create a table in Hive.
For this I have a schema.sql
file which contains the CREATE TABLE statement for the entire table and I want to generate the new CREATE TABLE statement only for the columns I imported.
Something similar to what I am doing with grep
in the below example.
I used FetchFile
along with ExtractText
but couldn't make it work. How can I achieve this using NiFi processors or even Expression Language if I get the overall schema into an attribute?
Or is there a better way to create table on the imported data?
NiFi can generate Create table statement[s] based on the flowfile content
if you are converting the avro data into ORC format then storing into HDFS then ConvertAvroToORC processor adds
hive.ddl
attribute to the flowfile.PutHDFS processor adds
absolute.hdfs.path
attribute to the flowfile.We can use this hive.ddl, absolute.hdfs.path attributes and create the orc table on top of HDFS directory dynamically.
Flow:
Refer to this link for more details regrads to the above flow.
In NiFi once we pull data by using QueryDatabaseTable,ExecuteSQL processors the format of the data is in AVRO.
We can create Avro tables based on avro schema(.avsc file) and by using ExtractAvroMetaData processor we can extract the schema and keep as flowfile attribute then by using this schema we can create AvroTables dynamically.
Flow:
Example AVRO create table statement:
We are going to change path to the schema url by using ReplaceText processor in the above flow.
Another way using ExecuteSQL processor get all the create table statements (or) columns info from (sys.tables/INFORMATION_SCHEMA.COLUMNS ..etc) from source (if source system permits) and write a script to
map the data types
intohive appropriate types
then store them in yourdesired format
in Hive.EDIT:
To run
grep
command on the flowfile content we need to use ExecuteStreamCommand processorESC Configs:
Then feed the
output stream
relation to ExtractText ProcessorET Configs:
Add new property as
content
Then
content attribute
is added to the flowfile, You can use that attribute and prepare create table statements.