Maven version ordering

307 views Asked by At

I have a table of Maven projects. Each project has a number of parameters and a version number.

When I select from the table I should only get the highest version, but due to the way maven versions look, this is tricky. This is the query I have so far:

select id, group_id as group, artifact_id as artifact
from (
   select 
     p.group_id,
     p.artifact_id,
     p.id,
     rank() over (partition by p.group_id, p.artifact_id order by p.version desc)
   from projects p
   ) as ranked
   where ranked.rank = 1

This does not give the highest version as the version does not follow a alphanumeric ordering.
The version format is described here.

The gist is that a version could be 1.2.3-SNAPSHOT, where 1 (major), 2 (minor), 3 (incremental) are numbers and should be ordered as such and SNAPSHOT (qualifier) is a string. If the version does not follow this format, it should be treated as a string.

Is this doable in PostgreSQL?

1

There are 1 answers

4
Erwin Brandstetter On

Parse the string. Like:

SELECT version
     , substring(version, '^(\d+)')::int AS major
     , substring(version, '^\d+\.(\d+)')::int AS minor
     , substring(version, '^\d+\.\d+\.(\d+)')::int AS incremental
     , substring(version, '-(.+)$') AS qualifier
FROM  (
   VALUES
     ('1.2.3-SNAPSHOT')
   , ('2-FOO')
   , ('2-BAR')
   , ('2.1-BAR')
   , ('13.5.6-SNAPSHOT')
   , ('13.11.11-SNAPSHOT')
   ) x(version)
ORDER  BY major NULLS LAST
        , minor NULLS FIRST
        , incremental NULLS FIRST
        , qualifier NULLS FIRST;

db<>fiddle here

  • substring(version, '^(\d+)') ... use substring() with regular expression patterns to parse.
    ^ ... start of string
    () ... capturing parentheses
    \d ... class shorthand for digits
  • substring(version, '^(\d+)')::int ... cast to integer to sort as number
  • major NULLS LAST ... versions without number go last (my assumption).
  • minor NULLS FIRST ... 2 comes before 2.1
  • NULLS LAST is the default ins ASC (ascending) sort order in ORDER BY and can be omitted.

You can use these expressions in ORDER BY directly (without adding them to the SELECT list). Just to demonstrate.

Advanced solution

For even more complex rules, you might want to use regexp_matches() - or, preferably, regexp_match() in Postgres 10 or later:

SELECT *
     , part[1] AS p1, part[2] AS p2, part[3] AS p3, part[4] AS p4
     , part[5] AS p5, part[6] AS p6, part[7] AS p7
FROM  (
   SELECT test_id, version
        , regexp_match(version
                     , '^(?:(\d+)(\w*)\.?(\d*)(\w*)\.?(\d*)(\w*))?(?:\-*(\w+))?') AS part
   FROM  (
      VALUES
        (1, '1.2.3-SNAPSHOT')
      , (2, '2-FOO')
      , (3, '2')
      , (4, '2-BAR')
      , (5, '2.1-BAR')
      , (6, '13.5.6-SNAPSHOT')
      , (7, '13.11.11-SNAPSHOT')
      , (8, '13.11a.11-SNAPSHOT')
      , (9, '13.11b.11')
      , (10, 'Test')
      , (11, 'TEST2')
      , (12, '1a')
      , (13, '1a.1a.1a-foo')
      , (14, '1a.1a.1a-')
      , (15, '1a.1a.1b-foo')
      , (16, 'sp9d8hgf')
      , (17, '2a-BAR')
      , (18, '2.1a-BAR')
      , (19, '2.1ab-BAR')
      , (20, 'incorrect1.2-foo')
      ) x(test_id, version)
   ) sub
ORDER  BY NULLIF(part[1], '')::int NULLS FIRST
        , NULLIF(part[2], '')      NULLS FIRST
        , NULLIF(part[3], '')::int NULLS FIRST
        , NULLIF(part[4], '')      NULLS FIRST
        , NULLIF(part[5], '')::int NULLS FIRST
        , NULLIF(part[6], '')      NULLS FIRST
        , part[7]                  NULLS FIRST;

db<>fiddle here

This deals with all additional rules from your comment.

regexp_matches() is a powerful tool, but you need basic understanding of regular expressions. Test if in doubt.
Note in particular:

  • Not adding the g switch. More here.
  • NULLIF(part[1],'')::int ... non-matches are listed as empty strings in the array. Need to be converted to NULL before casting to integer

Be aware of the difference between capturing () and non-capturing parentheses (:?). The former also report a match, while the latter does not.