Oracle SQL - Extract some words only from a column?

163 views Asked by At

I am working with Oracle SQL and I have a column with specific words that I want to extract. For example, I have:

MENACE DE MORT REITEREE EN RECIDIVE
VIOLENCES SUR CONJOINT

And I want to extract:

MENACE MORT
VIOLENCES CONJOINT

I've tried using regex_replace to replace specific words found in my list of words, but I have not been able to extract only the specific words. Is there a way to do this? I have about 50 specific words to extract.

Thank you in advance for your help!

3

There are 3 answers

4
d r On BEST ANSWER

In any case you will need a list of words (and probably some order of the wrords) to do the job. One of the options is to create a list of words and store them in a table. In this answer I will create CTE (words) as a sample. But first some sample data to work with:

WITH            --  S a m p l e   D a t a :
    tbl (ID, TXT) AS
        (   
            Select 1, 'MENACE DE MORT REITEREE EN RECIDIVE' From Dual Union All
            Select 2, 'VIOLENCES SUR CONJOINT' From Dual Union All
            Select 3, 'MALTRAITANCE PSYCHOLOGIQUE SUR CONJOINT' From Dual
        ),

... and words list ...

    words_str AS
        (     
            Select '\MENACE\VIOLENCES\MALTRAITANCE\PSYCHOLOGIQUE\MORT\CONJOINT\' "WORDS_TO_RETURN" From Dual
        ), 
    words AS
        (
            Select  LEVEL "WORD_ID", SubStr(WORDS_TO_RETURN, InStr(WORDS_TO_RETURN, '\', 1, LEVEL) + 1, InStr(WORDS_TO_RETURN, '\', 1, LEVEL + 1) - InStr(WORDS_TO_RETURN, '\', 1, LEVEL) -1) "WORDS"
            From    words_str
            Connect By LEVEL <= Length(WORDS_TO_RETURN) - Length(Replace(WORDS_TO_RETURN, '\', '')) - 1
        )
WORD_ID WORDS
1 MENACE
2 VIOLENCES
3 MALTRAITANCE
4 PSYCHOLOGIQUE
5 MORT
6 CONJOINT

You can now join your table with the words and truncate your text to the list of words from above...

--  M a i n   S Q L :
SELECT  DISTINCT ID, TXT, 
        LISTAGG(WORDS, ' ') OVER(Partition By ID, TXT) "SHORT_TXT"
FROM   (   Select  t.ID, t.TXT,  w.WORD_ID, w.WORDS
           From tbl t
           Inner Join  words w ON(  InStr('\' || Replace(t.TXT, ' ', '\') || '\' , '\' || w.WORDS || '\') > 0 )
           Order By t.ID, w.WORD_ID
        )
ORDER BY ID

Result should be ...

ID TXT SHORT_TXT
1 MENACE DE MORT REITEREE EN RECIDIVE MENACE MORT
2 VIOLENCES SUR CONJOINT VIOLENCES CONJOINT
3 MALTRAITANCE PSYCHOLOGIQUE SUR CONJOINT MALTRAITANCE PSYCHOLOGIQUE CONJOINT
3
Andrew On

You can use a simple regex and connect by to split the string into rows, and select the rows you want from that. Maybe store the values you want in a table and join to it.

with temp as  (
       select 1 as someID, 'MENACE DE MORT REITEREE EN RECIDIVE' as someStr  from dual
     )

select 

  someid,
  listagg(str, ' ') within group ( order by str) 
  from (
SELECT distinct someID ,trim(regexp_substr(someStr, '[^ ]+', 1, level)) str
  FROM (SELECT someID, someStr FROM temp) t
CONNECT BY instr(someStr, ' ', 1, level - 1) > 0
) t
  where
str in (
'MENACE' , 'MORT'
  )
group by
  someid

Fiddle

0
p3consulting On

You could also

CREATE INDEX index_name ON target_table(target_column) INDEXTYPE IS CTXSYS.CONTEXT;

And then instead of playing with CTE, recursive queries, and other goodies ... just store the search expression (with boolean operators) in your "query" table: e.g. 'MENACE AND MORT', ... (see doc here: https://docs.oracle.com/en/database/oracle/oracle-database/21/ccref/oracle-text-CONTAINS-query-operators.html#GUID-6410B783-FC9A-4C99-B3AF-9E0349AA43D1) then

SELECT ... FROM target_table t, queries q
WHERE CONTAINS(t.target_column, q.query_column) > 0;