I am building an application which logs automotive performance metrics to a table consisting of 100 columns X 500,000 rows per session. Some of the columns are very repetitive (coolant temperature in whole degrees Fahrenheit). Other columns change continuously (spark advance, manifold pressure, air fuel ratio).
Columnstores and page compression are out of the question since the project is targeted at the open source audience. It needs to support MS SQL Express Edition or another free database platform that scales well to large table sizes.
My initial solution is to allow null in some of the columns, which should dramatically reduce storage footprint by not inserting repeating values, and this allows me to increase the log resolution to a higher framerate.
However this introduces an obstacle when selecting discrete rows because certain columns will be 99% null. Therefore it is necessary to create a view (or computed column?) which will select the last row which contained a value in that field. My approach is to use a subquery for each sparse column. This seems grossly inelegant. Is there a more efficient approach I should consider?
SELECT ISNULL(
val1,
(
SELECT TOP 1 val1
FROM [values] subv
WHERE subv.id <= v.id
AND subv.val1 IS NOT NULL
ORDER BY
subv.id DESC
)
) AS val1,
ISNULL(
val2,
(
SELECT TOP 1 val2
FROM [values] subv
WHERE subv.id <= v.id
AND subv.val2 IS NOT NULL
ORDER BY
subv.id DESC
)
) AS val2,
ISNULL(
val3,
(
SELECT TOP 1 val3
FROM [values] subv
WHERE subv.id <= v.id
AND subv.val3 IS NOT NULL
ORDER BY
subv.id DESC
)
) AS val3
FROM [values] v
Maybe it's better to create new table instead, which contains only 3 columns, e.g.
MetricType, Metricvalue, MeasurementTime
. Insert new measurement values only when value changed for particular metric.