How to do inheritance / transmission queries in BigQuery Variant Schema

Question

How to do inheritance / transmission queries in BigQuery Variant Schema

254 views Asked by user553965 At 03 August 2018 at 00:07

The Variant Schema used by Google Genomics Variant Transform pipelines represents genotypes as nested records in BigQuery - for example:

(from: https://bigquery.cloud.google.com/table/genomics-public-data:1000_genomes.variants?pli=1&tab=preview)

I'm having trouble understanding how to write queries that involve relationships between samples - such as:

select all variants where sampleA.genotype=HET and sampleB.genotype=HET and sampleC.genotype=HOM-ALT

or similar queries where sampleA and sampleB are parents of sampleC and you're looking for variants that follow a particular inheritance pattern.

How are people writing these queries with the nested schema?

Original Q&A

There are 1 answers

**Mikhail Berlyant** · Accepted Answer · 2018-08-03T01:15:44+00:00

I think that would be something like below - have not tested as table is quite expensive - but one run gave zero output meaning that there is no records that meet that specific criteria - but at least you see the logic of how to do such query

SELECT * EXCEPT(cnt)
FROM (
    SELECT reference_name,  start, `end`,
        (SELECT COUNT(1) 
            FROM UNNEST(call) 
            WHERE (call_set_name="HG00261" AND genotype[SAFE_OFFSET(0)] = 0 AND genotype[SAFE_OFFSET(1)] = 1)
            OR (call_set_name="HG00593" AND genotype[SAFE_OFFSET(0)] = 1 AND genotype[SAFE_OFFSET(1)] = 0)
            OR (call_set_name="NA12749 " AND genotype[SAFE_OFFSET(0)] = 1 AND genotype[SAFE_OFFSET(1)] = 1)
        ) cnt
    FROM `genomics-public-data.1000_genomes.variants` 
)
WHERE cnt = 3

TechQA.

How to do inheritance / transmission queries in BigQuery Variant Schema

There are 1 answers

Related Questions in GOOGLE-BIGQUERY

Related Questions in GOOGLE-GENOMICS

Popular Questions

Popular Tags

Trending Questions