Separate a string with SUBSTRING and PATINDEX - last step

Question

Separate a string with SUBSTRING and PATINDEX - last step

101 views Asked by Alexander Knapp At 26 October 2023 at 12:49

I need to finalize a query. The query returns a column which contains values like "P100+P200" or "SUMME(P400:P1200)".

In the end, the result should be:

Column A	Column B	Column C
P100	+	P200
P400	:	P1200

Solved to extract column A and column B.

I used for the first two steps this code:

MAX (SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), PATINDEX('%[+:-]%', SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), LEN(t3.formel))) - 1)) "Formelteil 1",
MAX (SUBSTRING(t3.formel, PATINDEX('%[+:.-]%', t3.formel), 1) ) AS Sonderzeichen

But guess I'm going to be blind about the solution for the third step.

Original Q&A

There are 2 answers

Panagiotis Kanavos On 27 October 2023 at 08:34

T-SQL isn't a text manipulation language and doesn't even have regular expressions. It's a lot easier to do this task in a client language, using a regular expression like ([A-Z\d]+)([+:.-])([A-Z\d]+) to capture the three parts.

In the comments you mention the data is used in Power BI. You can use a Python Transformation in the Query editor to apply a regular expression to the data using Pandas' str.exact and automatically extract the parts into columns.

The Power BI step script is essentially a one-liner

import pandas as pd
pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)

str.extract applies the regular expression to all the values of the formel column (Series) and extracts each capture group into a separate column. dataset[['a','b','c']]= stores those columns in the original dataset using the names a, b and c.

You can easily test Python scripts in the command line or a Jupyter Notebook in VS Code.

The following script, in either Python or VS Code :

import pandas as pd
dataset=pd.DataFrame({'formel':['P100+P400','SUMME(P200:P300)']})

pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)
dataset

Prints

             formel     a  b     c
0         P100+P400  P100  +  P400
1  SUMME(P200:P300)  P200  :  P300

**Patrick Hurst** · Accepted Answer · 2023-10-26T14:45:38+00:00

As mentioned in the comments, this is not really a job for SQL Server.

When asking questions like this it's helpful to provide example DDL/DML:

DECLARE @Table TABLE (formel NVARCHAR(100));
INSERT INTO @Table (formel) VALUES 
('P100+P200'), ('G100/G200'), ('a100*z200'), ('P1005-P2005'), ('SUMME(P400:P1200)');

You're two thirds of the way there. Since we only seem to need to worry about one additional character, we can simply use the position of the operator + 1 to find the start of the last string and use an arbitrary number higher than the remaining characters, and then replace it with nothing:


SELECT t3.formel, 
SUBSTRING(t3.formel, PATINDEX('%[A-Z|a-z][0-9]%', t3.formel),PATINDEX('%[-|*|/|+|:]%', t3.formel)-PATINDEX('%[A-Z|a-z][0-9]%', t3.formel)) AS a,
SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel), 1) AS b,
REPLACE(SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel)+1, LEN(t3.formel)),')','') AS c
  FROM @Table t3;

formel	a	b	c
P100+P200	P100	+	P200
G100/G200	G100	/	G200
a100*z200	a100	*	z200
P1005-P2005	P1005	-	P2005
SUMME(P400:P1200)	P400	:	P1200

TechQA.

Separate a string with SUBSTRING and PATINDEX - last step

There are 2 answers

Related Questions in SQL-SERVER

Related Questions in SUBSTRING

Related Questions in PATINDEX

Popular Questions

Popular Tags

Trending Questions