i am trying to fetch some text from HTML string from a table. the text are like this and i am trying to get the output
ID,TEXT
1,<font face="Microsoft Sans Serif" size="8" color="#000000">ABC 123</font>
2,<font face="Microsoft Sans Serif" size="8" color="#000000">1 ETG</font>
3,<font face="Microsoft Sans Serif" size="8" color="#000000">Gatelys PERC</font>
4,<font face="Microsoft Sans Serif" size="8" color="#000000">Gml TK 144</font>
ID,TEXT
1,ABC 123
2,1 ETG
3,Gatelys PERC
4,Gml TK 144
I have tried this but not getting right output.
select ID,regexp_substr(TEXT, '[^>]+',1,2) from TABLE
where TEXT is not null;
If your HTML is always valid XHTML then you should use XML functions to parse the string:
Which, for the sample data:
Outputs:
If you want to naively remove all tags using string functions then you can use:
Which, for your sample data, outputs:
Note, it will fail if you have attributes inside a tag that contain a
>character.If you want a more comprehensive query then:
Which outputs the same as the XML query.
fiddle