BigQuery slower than SQL Server

643 views Asked by At

I am comparing query execution times in my data marts between SQL Server 2016 (Star Schema and using column based clustered index) and BigQuery (one single table)..I have around 20 million entries. Here is my query - I need month wise Sum of Oil, Gas, Water volumes. I have per day single entry of volumes for 10 years. I have 6500 entities and for each entity there is single entry of oil, gas, water volumes for 10 years..so total rows...6500(total entities) * 10(total years) * 365(total days)=23725000..ignore leap years for now

BigQuery - legacy SQL (taking 5 seconds)

SELECT [ASSET] AS [ASSET],
                  SUM([Measurements.GAS]) AS[sum_Measurements_GAS_ok],
                  SUM([Measurements.OIL]) AS[sum_Measurements_OIL_ok],
                  SUM([Measurements.WATER]) AS[sum_Measurements_WATER_ok],
                  STRFTIME_UTC_USEC(TIMESTAMP(TIMESTAMP_TO_MSEC(Measurements.DATE)), '%Y-%m') as [month]
                FROM [datamanager-dashboard:bigquerysample.initial_va_schema_v3][initial_va_schema_v3]
                GROUP BY 1, 5

SQL Server 2016 - Star Schema (taking 2 seconds) - running inside VM in Google Compute Engine - n1-standard-4

SELECT [dim_asset_types].[asset_name] AS Asset,
            SUM([fact_well_volume_events].[gas]) AS [sum:gas:ok],
            SUM([fact_well_volume_events].[oil]) AS [sum:oil:ok],
            SUM([fact_well_volume_events].[water]) AS [sum:water:ok],
            DATEADD(month, DATEDIFF(month, 0, [fact_well_volume_events].[measurement_date]), 0) AS [tmn:measurement_date:ok]
            FROM [dbo].[dim_asset_types] [dim_asset_types]
            INNER JOIN [dbo].[xref_well_to_asset_type] [xref_well_to_asset_type] ON ([dim_asset_types].[dim_asset_type_key] = [xref_well_to_asset_type].[dim_asset_type_key])
            INNER JOIN [dbo].[dim_wells] [dim_wells] ON ([xref_well_to_asset_type].[dim_well_key] = [dim_wells].[dim_well_key])
            INNER JOIN [dbo].[fact_well_volume_events_with_calculations] [fact_well_volume_events] ON ([dim_wells].[dim_well_key] = [fact_well_volume_events].[dim_well_key])
            GROUP BY [dim_asset_types].[asset_name],DATEADD(month, DATEDIFF(month, 0, [fact_well_volume_events].[measurement_date]), 0)

I gave single example only but it is happening with all sort of different queries. Am I missing something? Why BigQuery is so slow?

EDIT: I am attaching sample schema...its not full..

[
  {
    "name": "ASSET",
    "type": "STRING"
  },
  {
    "name": "Measurements",
    "type": "record",
    "mode": "repeated",
    "fields": [
      {
        "name": "DATE",
        "type": "TIMESTAMP"
      },
      {
        "name": "OIL",
        "type": "FLOAT"
      },
      {
        "name": "WATER",
        "type": "FLOAT"
      },
      {
        "name": "GAS",
        "type": "FLOAT"
      }
    ]
  }
]
1

There are 1 answers

0
Javier Montón On

BigQuery is not going to give you better results than SQL Server doing small queries over Datamarts. BigQuery works very well with huge queries but always have a fixed time of a few seconds. It needs to create an execution plan and distribute it in different machines and that needs time.

The difference between SQL Server and BigQuery is that you could run huge queries over your raw data (without Datamarts) and probably in queries that SQL Server could take minutes or hours, BigQuery is going to take less than a minute.