I use this code to filter text from pdf
file:
create or replace directory pdf_dir as '&1';
create or replace directory l_curr_dir as '&3';
declare
ll_clob CLOB;
l_bfile BFILE;
l_filename VARCHAR2(200) := '&2';
begin
begin
ctx_ddl.drop_preference('testfilter');
ctx_ddl.drop_policy('testdimac_policy1');
exception when others then
null;
end;
ctx_ddl.create_preference('testfilter', 'AUTO_FILTER');
ctx_ddl.create_policy('testd_policy1', 'testfilter');
l_bfile := bfilename('PDF_DIR', l_filename);
dbms_lob.fileopen(l_bfile);
ctx_doc.policy_filter(
policy_name => 'test_policy1'
, document => l_bfile
, restab => ll_clob
, plaintext => true
, CHARSET => 'US7ASCII'
);
DBMS_XSLPROCESSOR.clob2file (ll_clob,'L_CURR_DIR' , '&4');
/
The solution is good and working for me, but is there any way to get the tabular data, right now it is filtering text phrase by phrase or line by line.
For exeample, if pdf contains values like:
Name: Amount
Pradeep 100 USD
I want the output as is, but the current setup gives the output like:
Name:
Amount
Pradeep
100 USD
Is there any way to get the original format of text in pdf
?
Is it possible to change the filter?