I need to treat values ​from a string field and break it down into an array

31 views Asked by At

I have a table in bigquery with a string field that I need to read, disassemble and generate a new field with several rows receiving the disassembled field

The field has values ​​like this:

Column A
CC(2.0)+,EXO(BLT7,CFC6,E6D)
CC(2.0)+,EXO(BPT7,CFC5)+
MY(22,23,24)+

And i need to transform like this;

Column B
CC(2.0)+
EXO(BLT7)
EXO(CFC6)
EXO(E6D)
CC(2.0)+
EXO(BPT7)
EXO(CFC5)+
MY(22)
MY(23)
MY(24)+
# transform this: CC(2.0)+,EXO(BLT7,CFC6,E6D)  
# into this: CC(2.0)+,EXO(BLT7),EXO(CFC6),EXO(E6D)  
# select Split('CC(2.0)+,EXO(BLT7),EXO(CFC6),EXO(E6D)',',') 

Does anyone have any ideas how I can do this? Any regex?

To break into several lines is easy, just use the split function.
But how to rewrite values ​​in parentheses like "EXO(BLT7,CFC6,E6D)" to "EXO(BLT7),EXO(CFC6),EXO(E6D)"?

I tried using the replace function but it started to become a little monster and I'm not able to handle all the cases.

As I said before, I was even able to divide it into several lines with the split function, but my difficulty is in treating these values ​​in parentheses

1

There are 1 answers

0
Mikhail Berlyant On

Use below approach

with temp as (
  select ColumnA, el,
    regexp_extract(el, r'(.*?)\(') el1, 
    split(regexp_extract(el, r'\((.*?)\)')) arr, 
    regexp_extract(el, r'\+$') el3  
  from your_table, unnest(regexp_extract_all(ColumnA, r'(.*?\(.*?\)\+?),?')) el 
  order by ColumnA 
)
select format('%s(%s)%s', el1, el2, if(array_length(arr) - 1 = pos, ifnull(el3, ''), '')) as ColumnB
from temp, unnest(arr) el2 with offset as pos    

if applied to sample data in your question - output is

enter image description here