I'm looking to rank/aggregate conversation data in SQL (specifically BigQuery). The data is conversation data where each row represents one sentence. In the image below I have added the example data for speaker, sentence, and sequence_start. The desired_rank is the target outcome (or something similar).
I believe there should be a window function like a rank/lag/first that should get to the desired rank programmatically.
The closest I got was initially was the following:
WITH DATA AS (
SELECT 'Speaker A' as speaker, 'Sentence 1' as sentence, 1 as sentence_start, 1 as desired_rank
UNION ALL SELECT 'Speaker A' as speaker, 'Sentence 2' as sentence, 9 as sentence_start, 1 as desired_rank
UNION ALL SELECT 'Speaker B' as speaker, 'Sentence 3' as sentence, 27 as sentence_start, 2 as desired_rank
UNION ALL SELECT 'Speaker C' as speaker, 'Sentence 4' as sentence, 46 as sentence_start, 3 as desired_rank
UNION ALL SELECT 'Speaker A' as speaker, 'Sentence 5' as sentence, 78 as sentence_start, 4 as desired_rank
)
SELECT speaker, sentence, sentence_start, desired_rank,
FIRST_VALUE(sentence_start)
OVER (
PARTITION BY speaker
ORDER BY sentence_start
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM DATA
ORDER BY sentence_start
The issue with the result is that Speaker A is always ranked as 1, where it should be 4 (or something similar).
Your help is appreciated. Thanks!
Figured it out. Needed to join down to the next row to determine the change. Added a complication where speaker A speaks for sentence 5 and 6.