Query for missing elements

139 views Asked by At

I have a table with the following structure:

timestamp | name | value
0         | john | 5
1         | NULL | 3
8         | NULL | 12
12        | john | 3
33        | NULL | 4
54        | pete | 1
180       | NULL | 4
400       | john | 3
401       | NULL | 4
592       | anna | 2

Now what I am looking for is a query that will give me the sum of the values for each name, and treats the nulls in between (orderd by the timestamp) as the first non-null name down the list, as if the table were as follows:

timestamp | name | value
0         | john | 5
1         | john | 3
8         | john | 12
12        | john | 3
33        | pete | 4
54        | pete | 1
180       | john | 4
400       | john | 3
401       | anna | 4
592       | anna | 2

and I would query SUM(value), name from this table group by name. I have thought and tried, but I can't come up with a proper solution. I have looked at recursive common table expressions, and think the answer may lie in there, but I haven't been able to properly understand those.

These tables are just examples, and I don't know the timestamp values in advance.

Could someone give me a hand? Help would be very much appreciated.

3

There are 3 answers

6
Thomas On BEST ANSWER
With Inputs As
    (
    Select 0 As [timestamp], 'john' As Name, 5 As value
    Union All Select 1, NULL, 3
    Union All Select 8, NULL, 12
    Union All Select 12, 'john', 3
    Union All Select 33, NULL, 4
    Union All Select 54, 'pete', 1
    Union All Select 180, NULL, 4
    Union All Select 400, 'john', 3
    Union All Select 401, NULL, 4
    Union All Select 592, 'anna', 2
    )
    , NamedInputs As
    (
    Select I.timestamp
        , Coalesce (I.Name
            ,   (
                Select I3.Name
                From Inputs As I3
                Where I3.timestamp = (
                                    Select Max(I2.timestamp)
                                    From Inputs As I2
                                    Where I2.timestamp < I.timestamp
                                        And I2.Name Is not Null
                                    )
                )) As name
        , I.value
    From Inputs As I
    )
Select NI.name, Sum(NI.Value) As Total
From NamedInputs As NI
Group By NI.name

Btw, what would be orders of magnitude faster than any query would be to first correct the data. I.e., update the name column to have the proper value, make it non-nullable and then run a simple Group By to get your totals.

Additional Solution

Select Coalesce(I.Name, I2.Name), Sum(I.value) As Total
From Inputs As I
    Left Join   (
                Select I1.timestamp, MAX(I2.Timestamp) As LastNameTimestamp
                From Inputs As I1
                    Left Join Inputs As I2
                        On I2.timestamp < I1.timestamp
                            And I2.Name Is Not Null   
                Group By I1.timestamp
                ) As Z
        On Z.timestamp = I.timestamp        
    Left Join Inputs As I2
        On I2.timestamp = Z.LastNameTimestamp
Group By Coalesce(I.Name, I2.Name)
4
RichardTheKiwi On

You don't need CTE, just a simple subquery.

select t.timestamp, ISNULL(t.name, (
    select top(1) i.name
    from inputs i
    where i.timestamp < t.timestamp
    and i.name is not null
    order by i.timestamp desc
    )), t.value
from inputs t

And summing from here

select name, SUM(value) as totalValue
from
(
    select t.timestamp, ISNULL(t.name, (
        select top(1) i.name
        from inputs i
        where i.timestamp < t.timestamp
        and i.name is not null
        order by i.timestamp desc
        )) as name, t.value
    from inputs t
) N
group by name
2
Andriy M On

I hope I'm not going to be embarassed by offering you this little recursive CTE query of mine as a solution to your problem.

;WITH
numbered_table AS (
  SELECT
    timestamp, name, value,
    rownum = ROW_NUMBER() OVER (ORDER BY timestamp)
  FROM your_table
),
filled_table AS (
  SELECT
    timestamp,
    name,
    value
  FROM numbered_table
  WHERE rownum = 1

  UNION ALL

  SELECT
    nt.timestamp,
    name = ISNULL(nt.name, ft.name),
    nt.value
  FROM numbered_table nt
    INNER JOIN filled_table ft ON nt.rownum = ft.rownum + 1
)
SELECT *
FROM filled_table
/* or go ahead aggregating instead */