I have a requirement where data needs to be read in real time. I have two sources available for this data:
- A BigQuery table that is refreshed daily with data from yesterday.
- An API that can provide the data in real-time but is limited to 1 hour interval (Last 1 Hour)
I already built an Event Stream that has a Python notebook source that gets the data from the API. This Python code runs every 5 seconds and gets the data from the last 4 seconds. The event stream then pushes the data into a KQL database. This currently works great!
What I want to do now is the following: Import historical data until yesterday from Bigquery, and Direct Query today's data from the real time KQL database. I want to have both results under the same table in my Power BI dataset to combine them with dimentional data and create some sort of a centralized datamart for reporting purposes.
My question is the following: Is it possible to either:
- Use Hybrid Tables (Dataflows) to Incrementally Refresh my data from BigQuery until yesterday and the rest get it from KQL in Real-Time?
- Or combine/append the data from these both sources while keeping the mixed direct query/import aspect of it?
The first scenario would be preferably for maintainability reasons.
If neither one of these scenarios is possible, what architecture/tips would you recommend to both have real-time data from an API and historical data from BigQuery and be able to combine them into one table/dataflow to be used in a centralized dataset?
In advance, thank you very much for any help.
The most straightforward way to implement this is to store all the data in KQL databsase and use Incremental Refresh with a DirectQuery partition.
So you would load the BigQuery data into KQL database, or into OneLake and access it in KQL through a OneLake Shortcut
In either case you are storing the data in Fabric instead of importing it into Power BI.
It's also possible to have two separate tables; one imported from BigQuery using Incremental Refresh, and another using DirectQuery to KQL. But in that case your visuals would have to reference one or the other. You can create measures that aggregate over both tables, though.