I learned from the documentation about several join types supported by flink sql
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/joins/
I'm thinking about which real-time data processing scenarios these join functions are suitable for This is what I understand now
1.Regular join If multiple streaming events occur almost at the same time, for example, a purchase behavior will generate order and discount usage information at the same time, they will be collected into kafka almost at the same time, and if we need to integrate these two types of information in flink for subsequent use Analysis, using regular join is a suitable choice. We assume that a match can be made within 5 seconds to set a 5-second state cleanup time. The order table supports any kind of updating (insert, update, delete)
sql like:
SELECT
order_id,
user_id,
sku_id,
coupon_amount
FROM
order od
left join order_detail_coupon odc
on odc.id = od.id
What I currently understand is only the scenario where business data occurs almost simultaneously.
2.Interval Joins If we clear in advance that two events do not occur at the same time, but have an uncertain interval, such as ordering and payment, using interval join is a suitable choice.
The order table can only be append-only.
sql like:
SELECT
order_id,
user_id,
sku_id,
p.payment_time,
p.payment_type
FROM
payment p, order_detail od
WHERE p.order_id = od.order_id
AND p.row_time BETWEEN od.row_time - INTERVAL '15' MINUTE AND od.row_time + INTERVAL '5' SECOND
3.Temporal Joins It is suitable for a connection table that keeps changing,and this change is reasonable and meaningful, such as the currency_rates in the example of Flink's official website. In this case, these changed versions will be saved. In addition to the key, the version will also be selected according to the time for association.
sql like:
SELECT
order_id,
price,
orders.currency,
conversion_rate,
order_time
FROM orders
LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.order_time
ON orders.currency = currency_rates.currency;
4.Lookup Join Suitable for data enrich scenarios, such as querying dimensional data with a certain key from hbase through hbase connector, Only one database query will be performed based on the system time at the time of operation.
sql like:
SELECT
id,
user_id,
appraise_code,
info.appraise_name
FROM comment_info c
join base_dic FOR SYSTEM_TIME AS OF c.proc_time as b
on c.appraise_code = b.rowkey"
Is my understanding correct? What are your usage scenarios of flink sql join? I would like to know more about the characteristics of these join types.
Learn about other people’s understanding of flink sql join type