I want to write a case statement which can extract value for a particular substring from a column named details which has multiple occurrences for [address] is it possible with REGEX along with case?

sample Data in the column:

[address]:kattengat 28
[address]:1012 SZ
[address]: Amsterdam

The below SQL only outputs:

kattengat 28

Select case when to_number(instr(details),'[address')>0 then substr(details,REGEXP_INSTR(details,'address',1,1)+8,instr(substr(details,REGEXP_INSTR(details,'address',1,1)+8,'[')-1)) else '' end from table_name;

Expected output is : 
kattengat 28 1012 SZ Amsterdam

Create table statement:

Create table test (id number(10), details clob);

Insert statement :

insert into test (id, details) values (1,to_clob ('[ADDRESS    ] kattengat 28
             [NAME       ] ALEX
             [PHONE      ] 65438
             [ADDRESS    ] 1012 SZ
             [DOB        ] 1st Jan 1998
             [ADDRESS    ] Amsterdam')):

Please note I don't want to concat and add statements rather looking for a solution which can extract values associated with the substring [address] based on the number of occurrences of the substring in a single line

1

There are 1 answers

11
Littlefoot On BEST ANSWER

Here's one option:

SQL> with test (col) as
  2    (select '[address]:kattengat 28
  3  [address]:1012 SZ
  4  [address]: Amsterdam' from dual
  5    )
  6  select trim(replace(regexp_substr(replace(col, chr(10), '#'), '[^#]+', 1, column_value), '[address]:', '')) result
  7  from test cross join
  8    table(cast(multiset(select level from dual
  9                        connect by level <= regexp_count(col, ':')
 10                       ) as sys.odcinumberlist));

RESULT
--------------------------------------------------------------------------------
kattengat 28
1012 SZ
Amsterdam

SQL>

What does it do?

  • lines #1 - 5 - sample data
  • line #6:
    • regexp_substr part of code is responsible for splitting source column value into separate rows
    • it affects not the original value, but the one whose new line character (chr(10)) is replaced (the 2nd replace) by #, and that character is used as a separator for regexp_substr
  • the 1st replace removes [address]: from the source
  • trim removes leading/trailing empty strings (as the one in front of "Amsterdam"
  • lines #8 - 10 are here to remove duplicate values from the final result (if source table doesn't contain only one row, as in my example). If it actually does, then code can be somewhat simplified.

With sample data you posted later (by the way, are you sure there are spaces in front of [NAME] etc.? I guess NOT!):

SQL> select * from test;

        ID DETAILS
---------- --------------------------------------------------
         1 [ADDRESS    ] kattengat 28
           [NAME       ] ALEX
           [PHONE      ] 65438
           [ADDRESS    ] 1012 SZ
           [DOB        ] 1st Jan 1998
           [ADDRESS    ] Amsterdam

Code I previously posted, slightly modified because previously address was in lower case, there were NO spaces within square brackets, and there was a colon sign):

SQL> with temp as
  2    (select trim(replace(regexp_substr(replace(details, chr(10), '#'), '[^#]+', 1, column_value), '[ADDRESS    ]', '')) result
  3     from test cross join
  4     table(cast(multiset(select level from dual
  5                         connect by level <= regexp_count(details, '\[')
  6                        ) as sys.odcinumberlist))
  7    )
  8  select *
  9  from temp
 10  where instr(result, '[') = 0;

RESULT
--------------------------------------------------------------------------------
kattengat 28
1012 SZ
Amsterdam

SQL>

If you want to get result in one line, you could aggregate values returned by that query as

SQL> with temp as
  2    (select trim(replace(regexp_substr(replace(details, chr(10), '#'), '[^#]+', 1, column_value), '[ADDRESS    ]', '')) result,
  3            column_value cv
  4     from test cross join
  5     table(cast(multiset(select level from dual
  6                         connect by level <= regexp_count(details, '\[')
  7                        ) as sys.odcinumberlist))
  8    )
  9  select listagg (result, ', ') within group (order by cv) final_result
 10  from temp
 11  where instr(result, '[') = 0;

FINAL_RESULT
--------------------------------------------------------------------------------
kattengat 28, 1012 SZ, Amsterdam

SQL>