Extract number between two substrings in sql

500 views Asked by At

I had a previous question and it got me started but now I'm needing help completing this. Previous question = How to search a string and return only numeric value?

Basically I have a table with one of the columns containing a very long XML string. There's a number I want to extract near the end. A sample of the number would be this...

<SendDocument DocumentID="1234567">true</SendDocument>

So I want to use substrings to find the first part = true so that Im only left with the number.

What Ive tried so far is this:

SELECT SUBSTRING(xml_column, CHARINDEX('>true</SendDocument>', xml_column) - CHARINDEX('<SendDocument',xml_column) +10087,9) 

The above gives me the results but its far from being correct. My concern is that, what if the number grows from 7 digits to 8 digits, or 9 or 10?

In the previous question I was helped with this:

SELECT SUBSTRING(cip_msg, CHARINDEX('<SendDocument',cip_msg)+26,7)

and thats how I got started but I wanted to alter so that I could subtract the last portion and just be left with the numbers.

So again, first part of the string that contains the digits, find the two substrings around the digits and remove them and retrieve just the digits no matter the length.

Thank you all

2

There are 2 answers

2
Michael O'Connor On BEST ANSWER

You should be able to setup your SUBSTRING() so that both the starting and ending positions are variable. That way the length of the number itself doesn't matter.

From the sound of it, the starting position you want is right After the "true"

The starting position would be:

CHARINDEX('<SendDocument DocumentID=', xml_column) + 25
((adding 25 because I think CHARINDEX gives you the position at the beginning of the string you are searching for))

Length would be:

CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25
((Position of the ending text minus the position of the start text))

So, how about something along the lines of:

SELECT SUBSTRING(xml_column, CHARINDEX('<SendDocument DocumentID=', xml_column)+25,(CHARINDEX('>true</SendDocument>',xml_column) - CHARINDEX('<SendDocument DocumentID=', xml_column)+25))
6
Mike On

Have you tried working directly with the xml type? Like below:

DECLARE @TempXmlTable TABLE
(XmlElement xml )

INSERT INTO @TempXmlTable
select Convert(xml,'<SendDocument DocumentID="1234567">true</SendDocument>')



SELECT
element.value('./@DocumentID', 'varchar(50)') as DocumentID
FROM
@TempXmlTable CROSS APPLY
XmlElement.nodes('//.') AS DocumentID(element)
WHERE   element.value('./@DocumentID', 'varchar(50)')  is not null

If you just want to work with this as a string you can do the following:

DECLARE @SearchString varchar(max) = '<SendDocument DocumentID="1234567">true</SendDocument>'
DECLARE @Start int = (select CHARINDEX('DocumentID="',@SearchString)) + 12 -- 12 Character search pattern
DECLARE @End int = (select CHARINDEX('">', @SearchString)) - @Start --Find End Characters and subtract start position

SELECT SUBSTRING(@SearchString,@Start,@End)

Below is the extended version of parsing an XML document string. In the example below, I create a copy of a PLSQL function called INSTR, the MS SQL database does not have this by default. The function will allow me to search strings at a designated starting position. In addition, I'm parsing a sample XML string into a variable temp table into lines and only looking at lines that match my search criteria. This is because there may be many elements with the words DocumentID and I'll want to find all of them. See below:

IF EXISTS (select * from sys.objects where name = 'INSTR' and type = 'FN')
DROP FUNCTION [dbo].[INSTR]
GO

CREATE FUNCTION [dbo].[INSTR] (@String VARCHAR(8000), @SearchStr VARCHAR(255), @Start INT, @Occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE @Found INT = @Occurrence,
@Position INT = @Start;

WHILE 1=1
BEGIN
-- Find the next occurrence
SET @Position = CHARINDEX(@SearchStr, @String, @Position);

-- Nothing found
IF @Position IS NULL OR @Position = 0
RETURN @Position;

-- The required occurrence found
IF @Found = 1
BREAK;

-- Prepare to find another one occurrence
SET @Found = @Found - 1;
SET @Position = @Position + 1;
END

RETURN @Position;
END
GO

--Assuming well formated xml
DECLARE @XmlStringDocument varchar(max) =   '<SomeTag Attrib1="5">
                                            <SendDocument DocumentID="1234567">true</SendDocument>
                                            <SendDocument DocumentID="1234568">true</SendDocument>
                                            </SomeTag>'

--Split Lines on this element tag
DECLARE @SplitOn nvarchar(25) = '</SendDocument>' 

--Let's hold all lines in Temp variable table
DECLARE @XmlStringLines TABLE
    (
        Value nvarchar(100)
    ) 

        While (Charindex(@SplitOn,@XmlStringDocument)>0)
        Begin

            Insert Into @XmlStringLines (value)
            Select 
                Value = ltrim(rtrim(Substring(@XmlStringDocument,1,Charindex(@SplitOn,@XmlStringDocument)-1)))

            Set @XmlStringDocument = Substring(@XmlStringDocument,Charindex(@SplitOn,@XmlStringDocument)+len(@SplitOn),len(@XmlStringDocument))
        End

        Insert Into @XmlStringLines (Value)
        Select Value = ltrim(rtrim(@XmlStringDocument))

    --Now we have a table with multple lines find all Document IDs
    SELECT 
    StartPosition = CHARINDEX('DocumentID="',Value) + 12,
    --Now lets use the INSTR function to find the first instance of '">' after our search string
    EndPosition = dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1),
    --Now that we know the start and end lets use substring
    Value = SUBSTRING(value,( 
                -- Start Position
                CHARINDEX('DocumentID="',Value)) + 12, 
                    --End Position Minus Start Position
                dbo.INSTR(Value,'">',( CHARINDEX('DocumentID="',Value)) + 12,1) - (CHARINDEX('DocumentID="',Value) + 12))
    FROM 
        @XmlStringLines 
    WHERE Value like '%DocumentID%' --Only care about lines with a document id