Looker explore's joins are too complex to support reporting

191 views Asked by At

I have a looker model I'm building and want to run a dashboard off of.

The core issue is that I have two views I want to join together but they have so many similar fields that I cant accomplish without joining one to the other with a full outer join

These are the views I'm trying to join together:

  • sessions.view
  • orders.view
  • items.view
  • id_lookup.view
  • lifetime_value.view
  • facebook_spend.view
  • google_spend.view

Primarily it's sessions and orders I'm having the problem with. They have a bunch of fields to join on (e.g. date, source, medium, campaign, term, landing page) and my problem is that i have to set up an OR filter group to include the data from both session counts and orders, even though the orders is a full outer join.

Here's the model

a_model.model.lookml

connection: "bigquery"

label: "A Model's Explores"

include: "/path/to/dir/*.view"

persist_with: default_datagroup

datagroup: default_datagroup {
  max_cache_age: "15 minutes"
}

explore: an_explore {
    from: session_counts

    join: orders {
      relationship: one_to_many
      type: full_outer
      view_label: "Order Details"
      sql_on:
      (
      ${an_explore.report_date} = ${orders.order_created_at_date}
      and lower(${an_explore.source}) = lower(${orders.source})
      and lower(${an_explore.medium}) = lower(${orders.medium})
      and lower(${an_explore.campaign}) = lower(${orders.campaign})
      and lower(${an_explore.keyword}) = lower(${orders.term})
      and ${an_explore.landing_page} = ${orders.landing_page}
      )
      ;;
    }
    
    join: items {
      relationship: one_to_many
      view_label: "Item Details"
      sql_on: ${orders.order_number} = ${items.order_number};;
    }

    join: id_lookup {
      relationship: many_to_one
      sql_on: ${items.subscription_id} = CAST(${id_lookup.subscription_id} AS STRING);;
      # fields: []
    }

    join: lifetime_value {
      relationship: one_to_one
      type: left_outer
      sql_on: cast(${id_lookup.subscription_id} as string) = ${lifetime_value.subscription_id};;
    }
    
    join: facebook_view {
      relationship: many_to_many
      sql_on:
      ${facebook_view.report_date} = ${an_explore.report_date}
      and (lower(${an_explore.campaign}) = lower(${facebook_view.campaign_name})
      or lower(${an_explore.medium}) = 'paidsocial'
      );;
    }
    
    join: google_view {
      relationship: many_to_many
      view_label: "Google Conversions"
      sql_on:
      ${google_view.report_date} = ${an_explore.report_date}
      and lower(${an_explore.source}) = 'google'
      and lower(${an_explore.medium}) = 'paid_search';;
    }
  }


I'm tempted to think that i need to go up a level and combine the data in bigquery but I'd prefer to expose as much data as possible to the customer

I tried creating a "base_view.view" which just combines the date, source, medium, etc. but it ends up being huge and the dashboard takes 30m to run.

Desired outcome is for the date to be able to filter off one field in the Look/Dashboard and for the fields that are split by source/medium/campaign/term to match up between the two tables. Also for data that's missing on one side to populate similar to an outer join!

1

There are 1 answers

0
Hamza On

To simplify the joins I would start with a custom dimension in each of orders and session_counts. For orders it would look something like:

dimension joinkey:
   type: string
   sql: ${order_created_at_date} || ${source} || ${medium} ${campaign} || ${term} || ${landing_page} ;;

and join over that. alternatively you can create a derived_table over both of them use that instead.

To address the speed issue:

  1. Do you really need to refresh this 15mins if it actually takes 30 mins. I would rethink max_cache_age: "15 minutes"
  2. To pre-filter the query data for faster response time, it is worth exploring wherever applicable:

enter image description here

  1. If query tme is critical, relevant settings such as Max connections per node would be helpful as well