ClickHouse - Duplicate Rows in Materialized View Despite Unchanged Setup

79 views Asked by At

I'm facing an issue with creating materialized views in ClickHouse where each row is duplicated in the materialized view, despite no changes to my setup. My data warehouse is populated through an ELT process from a PostgreSQL database using Airbyte. Interestingly, the issue persists even when creating the materialized view directly through the ClickHouse console.

The problem surfaced about 3-4 weeks ago. Previously, the materialized view was created as expected without any duplicates. Running the query outside of a materialized view context returns the correct result set (approximately 8 million rows). However, when I create a materialized view with the same query, the result set unexpectedly doubles in size to approximately 16 million rows, effectively duplicating each row.

Here's the query used for the materialized view:

CREATE MATERIALIZED VIEW transactions
ENGINE = MergeTree
ORDER BY (ContainerLocationsId) POPULATE AS 

    SELECT
        stores.Id AS StoreId,
        stores.Type as StoreSubtype,
        stores.Name AS StoreName,
        stores.City AS City,
        containerLocations.Id AS ContainerLocationsId,
        containerLocations.OwnerSince AS OwnerSince,
        containerLocations.OwnerTill AS OwnerTill,
        containerLocations.OwnerId AS OwnerId,
        containerLocations.NextOwnerId AS NextOwnerId,
        containerLocations.ContainerId AS ContainerId,
        containerLocations.TransactionType AS TransactionType,
        containerTypes.Name AS ContainerTypeName
    FROM containerLocations 
    LEFT JOIN userLinks ON containerLocations.OwnerId = userLinks.UserId
    LEFT JOIN containers ON containers.Id = containerLocations.ContainerId
    LEFT JOIN stores ON userLinks.StoreId = stores.Id 
    LEFT JOIN storeCategories ON storeCategories.Id = stores.CategoryId    
    LEFT JOIN containerTypes ON containerTypes.Id = containers.TypeId    
    WHERE containerLocations.IsBillableTransaction = 1

The ContainerLocationsId is designed to be a unique, incrementing ID. I haven't made any modifications to the query or the underlying data structure that could account for this sudden change in behavior.

The data successfully loads into the warehouse without any duplication issues when running the query normally. Despite attempts to resolve the issue by adding more columns to the sorting key, the duplication persists. The integrity of the original tables in the warehouse confirms that the ELT process is operating correctly. Additionally, my search through the ClickHouse documentation has not yielded a solution. As I am relatively new to ClickHouse, I'm uncertain about the cause of this issue.

0

There are 0 answers