Filtering Autodesk Vault vertical data, getting the newest record for each drawing

459 views Asked by At

Using Microsoft SQL Server Express Edition (64-bit) 10.0.550.0

I'm trying to extract data from an Autodesk Vault server. The SQL involved to get to the required data is too advanced for my current level of knowledge, so I'm trying to lay a puzzle using bits from Google and StackOverflow as pieces. Using this excellent answer I was able to transpose the vertical data into a manageable horizontal format.

The Autodesk Vault database stores information about CAD drawings (among other things). The main vertical table dbo.Property holds information about all the different revisions of each CAD drawing. The problem I'm currently facing is that I'm getting too much data. I just want the data from the latest revision of each CAD drawing.

Here's my SQL so far

select 
    CreateDate,
    EntityID,
    PartNumber,
    CategoryName,
    [Subject],
    Title
from 
(
    select 
        EntityID,
        CreateDate,
        [53] as PartNumber,
        [28] as CategoryName,
        [42] as [Subject],
        [43] as Title
    from 
    (
        select
            p.Value, 
            p.PropertyDefID,
            p.EntityID,
            e.CreateDate
        from dbo.Property as p
        inner join dbo.Entity as e on p.EntityID = e.EntityId
        where p.PropertyDefID in(28, 42, 43, 53)
        and e.EntityClassID = 8
    ) t1
    pivot 
    (
        max(Value)
        for PropertyDefID in([28], [42], [43], [53])
    ) t2
) t3
where PartNumber is not null
and PartNumber != ''
and CategoryName = 'Drawing'
-- (1) additional condition
order by PartNumber, CreateDate desc

Where dbo.Property.Value is of sql_variant datatype. The query above results in a data set similar to this:

CreateDate | EntityID | PartNumber | CategoryName | Subject | Title 
---------------------------------------------------------------------
2016-01-01 |    59046 |      10001 | Drawing      | Xxxxx   | Yyyyy
2016-05-01 |    60137 |      10001 | Drawing      | Xxxxx   | Yyyyy
2016-08-01 |    62518 |      10001 | Drawing      | Xxxx    | Yyyyyy
2016-12-16 |    63007 |      10001 | Drawing      | Xxxxxx  | Yyyyyy
2016-01-01 |    45776 |      10002 | Drawing      | Zzzzz   | NULL  
2016-11-01 |    65011 |      10002 | Drawing      | Zzzzzz  | NULL  
...
(about 23000 rows)

The problem that I have is that I'm getting all revisions for each drawing. In the example above I only want the latest revision for PartNumber=10001 dated '2016-12-16' etc.

I have also looked at this answer on how to group and select rows where one of the columns has a max value, but I just can't seem to figure out how to combine the two. I tried adding the following snippet to the commented line in the above query, but it fails on many different levels.

and (PartNumber, CreateDate) in 
(
    select PartNumber, max(CreateDate)
    from t3
    group by PartNumber 
)

The reason I'm tagging this question "pivot", although the pivoting is already done, is that I suspect that the pivoting is what's causing me trouble. I just haven't been able to wrap my head around this pivoting stuff yet, and my SQL optimization skills are seriously lacking. Maybe the filtering should be done at an inner level?

1

There are 1 answers

0
Tomas Eklund On BEST ANSWER

Drawing inspiration from the comment provided by @Strawberry, I kept working and tweaking until I got something that seems to work. I had to use a PIVOT inside a PIVOT for it all to work.

Edit: At first I used views, but then the prerequisites changed as I had to work with a read-only database user. Fortunately, I was still allowed to create temporary tables.

This is the final result.

if object_id('tempdb.dbo.#Properties', 'U') is not null
    drop table #Properties

create table #Properties 
(
    PartNumber  nvarchar(max),
    [Subject]   nvarchar(max),
    Title       nvarchar(max),
    CreateDate  datetime
)

insert into #Properties
(
    PartNumber,
    [Subject],
    Title,
    CreateDate
)
select 
    convert(nvarchar(max), PartNumber),
    convert(nvarchar(max), [Subject]), 
    convert(nvarchar(max), Title),
    convert(datetime, CreateDate)
from 
(
    select 
        EntityID,
        CreateDate,
        [53] as PartNumber,
        [42] as [Subject],
        [43] as Title
    from 
    (
        select
            p.Value, 
            p.PropertyDefID,
            p.EntityID,
            e.CreateDate
        from dbo.Property as p
        inner join dbo.Entity as e on p.EntityID = e.EntityId
        where p.PropertyDefID in (42, 43, 53)
        and e.EntityClassID = 8
        and p.EntityID in
        (
            select 
                max(EntityID) as MaxEntityID
            from 
            (
                select 
                    EntityID,
                    [28] as CategoryName,
                    [53] as PartNumber
                from
                (
                    select
                        p.Value,
                        p.EntityID,
                        p.PropertyDefID
                    from dbo.Property as p
                    inner join dbo.Entity as e on p.EntityID = e.EntityId
                    where p.PropertyDefID in (28, 53)
                    and e.EntityClassID = 8 -- FileIteration
                ) as t1
                pivot
                (
                    max(Value)
                    for PropertyDefID in ([28], [53])
                ) as t2
            ) as t3
            where CategoryName = 'Drawing'
            group by PartNumber
        )
    ) as t4
    pivot 
    (
        max(Value)
        for PropertyDefID in ([42], [43], [53])
    ) as t5
) as t6
where PartNumber is not null
and PartNumber != ''
order by PartNumber

select * from #Properties;
-- search conditions goes here

I had to change the suggested join to a where x in(y) because the join was insanely slow (I terminated the query after four minutes). Now the resulting data set (which takes ~2 seconds to produce) looks promising:

PartNumber | Subject | Title  | CreateDate       | ...
-----------------------------------------------------------------------
100000     | Xxxxxx  | Yyyyyy | 2015-08-17 09-10 | ...
100001     | Zzzzzz  | NULL   | 2015-09-02 15-23 | ...
...
(about 8900 rows)

No more old revisions in the set.