Regexp to extract table schema from create table statement

1.1k views Asked by At

I can't wrap my head around this apparently simple regex problem; say I have the following CREATE TABLE statement:

CREATE TABLE foo (col1 text PRIMARY KEY NOT NULL, col2 text, col3 text)

What I'm struggling to achieve is basically write a javascript regular expression which will return these 3 matches:

1. `col1 text PRIMARY KEY NOT NULL`
2. `col2 text`
3. `col3 text`

which would be the starting point for further reasoning I will not go into.
On this awesome site I tried pasting the following regex:

/CREATE TABLE (?:\w+)\s\(((?:\w+?)\s(?:\w+?)(?:\sPRIMARY KEY NOT NULL)?(?:,\s)?)+\)/

Testing it against the example statement, it only returns one match, the last one:

1. `col3 text`

In the "Explanation" side pane there is a note saying:

A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data

So I tried modifying my regex accordingly, by adding a capturing group around the repeated group (after the first literal "(" and before the corresponding ")") and it now reads:

/CREATE TABLE (?:\w+)\s\((((?:\w+?)\s(?:\w+?)(?:\sPRIMARY KEY NOT NULL)?(?:,\s)?)+)\)/

But still the results are not those I require, since the above yields:

1. `col1 text PRIMARY KEY NOT NULL, col2 text, col3 text`
2. `col3 text`

Plus, the above note about only captuing the last iteration, which I thought would go away after grouping the repeated group, is still there, only it's on the 2nd capturing group (which got appended to the first when I added the extra set of brackets).
Please help, I'm kinda stumped. Cheers.

1

There are 1 answers

3
SoEzPz On

Perhaps this will work?

var string = "CREATE TABLE foo (col1 text PRIMARY KEY NOT NULL, col2 text, col3 text)"
var regex = /\(([\w|\s]*),\s*([\w|\s]*),\s*([\w|\s]*)\)/;

string.match(regex).slice(1);
// result: ["col1 text PRIMARY KEY NOT NULL", "col2 text", "col3 text"];

Note: .slice(1) just removes the original string, which is automatically included when using String.prototype.match.

or more clearly...

  1. col1 text PRIMARY KEY NOT NULL
  2. col2 text
  3. col3 text

--UPDATED ANSWER--

Using the following regex

var regex = /(\w\s*\w+)*(?=\)|,)/gi;

On the following strings

var stringOne = "CREATE TABLE foo (col1 text PRIMARY KEY NOT NULL)"
var stringTwo = "CREATE TABLE foo (col1 text PRIMARY KEY NOT NULL, col2 text)"
var stringThree = "CREATE TABLE foo (col1 text PRIMARY KEY NOT NULL, col2 text, col3 text, col4 text, col5 text, col6, text, col7 text, col8 text)"

Will result in the following matches

NOTE: Using String.prototype.match results in a returned array with empty strings; as many empty strings as there were matches. This is not part of the regex as far as I can tell, but part of using .match(). Thus the matched array may require an additional step to remove "''".

--

ex. initial return value of "["col1 text PRIMARY KEY NOT NULL", "", "col2 text", ""]" would need the two "''" elements removed to get the closest answer for your question. However, you many not care they are there so removing them may not be required. I will remove them in the below examples using .filter();

function emptyStrings( string ){
  return !!string;
}

Then the following variables will return

stringOne.match(regex).filter(emptyStrings);
// => ["col1 text PRIMARY KEY NOT NULL"]

stringTwo.match(regex).filter(emptyStrings);
// => ["col1 text PRIMARY KEY NOT NULL", "col2 text"]

stringThree.match(regex).filter(emptyStrings);
// => ["col1 text PRIMARY KEY NOT NULL", "col2 text", "col3 text", "col4 text", "col5 text", "col6", "text", "col7 text", "col8 text"]

This updated regex should continue to match up to the limit of the JS language, however using match will also result in the additional "''" empty strings per match. Just remove them if they are an issue as I have shown, if that helps...