How to get text from pdf preserving original formatting (with CTX_DOC)?

768 views Asked by pradeep At 21 June 2015 at 08:31

I use this code to filter text from pdf file:

create or replace directory pdf_dir as '&1';

create or replace directory l_curr_dir as '&3';

declare
  ll_clob     CLOB;
  l_bfile     BFILE;
  l_filename  VARCHAR2(200) := '&2';
begin
  begin
    ctx_ddl.drop_preference('testfilter');
    ctx_ddl.drop_policy('testdimac_policy1');
  exception when others then
    null;
  end;

  ctx_ddl.create_preference('testfilter', 'AUTO_FILTER');
  ctx_ddl.create_policy('testd_policy1', 'testfilter');

  l_bfile := bfilename('PDF_DIR', l_filename);

  dbms_lob.fileopen(l_bfile);

  ctx_doc.policy_filter(
      policy_name => 'test_policy1'
    , document    => l_bfile
    , restab      => ll_clob
    , plaintext   => true
    , CHARSET     => 'US7ASCII'
  );

DBMS_XSLPROCESSOR.clob2file (ll_clob,'L_CURR_DIR' , '&4');
/

The solution is good and working for me, but is there any way to get the tabular data, right now it is filtering text phrase by phrase or line by line.

For exeample, if pdf contains values like:

Name:            Amount  
Pradeep          100 USD

I want the output as is, but the current setup gives the output like:

Name:
Amount
Pradeep
100 USD

Is there any way to get the original format of text in pdf?

Is it possible to change the filter?

Original Q&A

TechQA.

How to get text from pdf preserving original formatting (with CTX_DOC)?

There are 0 answers

Related Questions in ORACLE

Related Questions in PLSQL

Related Questions in PDF-PARSING

Related Questions in BFILE

Popular Questions

Popular Tags

Trending Questions