Using documentParser function in Teradata Aster

129 views Asked by At

I'm working with Teradata's Aster and am trying to parse a pdf(or html) file such that it is inserted into a table in the Beehive database in Aster. The entire pdf should correspond to a single row of data in the table.

This is to be done by using one of Aster's SQL-MR functions called documentParser. This will produce a text file(.rtf) containing a single row produced by parsing all the chapters from the pdf file, which would be then loaded into the table in Beehive.

I have been given this script that shows the use of documentParser and other steps involved in this parsing process -

/* SHELL INSTRUCTIONS */
--transform file in b64 (change file names to your relevant file)

base64 pp.pdf>pp.b64

--prepare a loadfile
rm my_load_file.txt


-- get the content of the file
var=$(cat pp.b64)

-- put in file
echo \""pp.b64"\"","\""$var"\" >> "my_load_file.txt"


-- create staging table 
act -U db_superuser -w db_superuser -d beehive -c "drop table if exists public.cf_load_file;"
act -U db_superuser -w db_superuser -d beehive -c "create dimension table public.cf_load_file(file_name varchar, content varchar);"


-- load into staging table
ncluster_loader -U db_superuser -w db_superuser -d beehive --csv --verbose public.cf_load_file my_load_file.txt


-- use document parser to load the clean text (you will need to create the table beforehand)

act -U db_superuser -w db_superuser -d beehive -c "INSERT INTO got_data.cf_got_text_data (file_name, content) SELECT * FROM documentParser (ON public.cf_load_file documentCol ('content') mode ('text'));"

--done

However, I am stuck on the last step of the script because it looks like there is no function called documentParser in the list of functions that are available in Aster. This is the error I get -

ERROR:  function "documentparser" does not exist

I tried to search for this function several times with the command \dF, but did not get any match.

I've attached a picture which present the gist of what I'm trying to do.

SQL-MR Document Parser

I would appreciate any help if any one has any experience with this.

1

There are 1 answers

0
topchef On

What happened is that someone told you about this function documentParser but never gave you the function archive file (documentParser.zip) to install in Aster. This function does exist but it's not part of the official Aster Analytics Foundation (AAF). Please contact person who gave you this info for help.

documentParser belongs to so-called field functions that are developed and used by the Aster field team only. Not that you can't use it, but don't expect support to help you - only whoever gave you access to it.

If you don't have any contacts then next course of action I'd suggest to go to Aster Community Network and ask question about it there.