I am importing data from MySQL table (for selected columns only) and putting it in HDFS. Once this is done, I want to create a table in Hive.
For this I have a schema.sql file which contains the CREATE TABLE statement for the entire table and I want to generate the new CREATE TABLE statement only for the columns I imported.
Something similar to what I am doing with grep in the below example.
I used FetchFile along with ExtractText but couldn't make it work. How can I achieve this using NiFi processors or even Expression Language if I get the overall schema into an attribute?
Or is there a better way to create table on the imported data?

NiFi can generate Create table statement[s] based on the flowfile content
if you are converting the avro data into ORC format then storing into HDFS then ConvertAvroToORC processor adds
hive.ddlattribute to the flowfile.PutHDFS processor adds
absolute.hdfs.pathattribute to the flowfile.We can use this hive.ddl, absolute.hdfs.path attributes and create the orc table on top of HDFS directory dynamically.
Flow:
Refer to this link for more details regrads to the above flow.
In NiFi once we pull data by using QueryDatabaseTable,ExecuteSQL processors the format of the data is in AVRO.
We can create Avro tables based on avro schema(.avsc file) and by using ExtractAvroMetaData processor we can extract the schema and keep as flowfile attribute then by using this schema we can create AvroTables dynamically.
Flow:
Example AVRO create table statement:
We are going to change path to the schema url by using ReplaceText processor in the above flow.
Another way using ExecuteSQL processor get all the create table statements (or) columns info from (sys.tables/INFORMATION_SCHEMA.COLUMNS ..etc) from source (if source system permits) and write a script to
map the data typesintohive appropriate typesthen store them in yourdesired formatin Hive.EDIT:
To run
grepcommand on the flowfile content we need to use ExecuteStreamCommand processorESC Configs:
Then feed the
output streamrelation to ExtractText ProcessorET Configs:
Add new property as
content
Then
content attributeis added to the flowfile, You can use that attribute and prepare create table statements.