I have a database that has to have approximately 20,000 records per day. From 2022 to 2024. I have created 6 partitions for each quarter of the years 2022 and 2023. In total there are 14 million records between 2022 and 2023. When loading 1 day's data with a python script, without partitions, it takes between 1-4 minutes, but when adding partitions it takes up to 10 minutes. This is normal? I'm afraid that adding more data will take much longer.
Is there any way to make this not take so long?
DDL principal table
CREATE TABLE public.orders (
id varchar(20) NOT NULL,
"orderType" varchar(20) NULL,
description text NULL,
"createdDate" timestamp NULL,
samples json NULL,
discarded bool NULL,
"customerId" varchar(5) NULL,
"originId" varchar(4) NULL,
"labTestGroups" json NULL,
prices json NULL,
tax varchar(1) NULL,
status varchar(20) NULL
)
PARTITION BY RANGE ("createdDate");
CREATE INDEX "orders_createdDate_idx" ON ONLY public.orders USING btree ("createdDate");
And the DDL of my partitions are like this =
CREATE TABLE public.orders2022_q1 PARTITION OF public.orders FOR VALUES FROM ('2022-01-01 00:00:00') TO ('2022-04-01 00:00:00');
asyncpg
could do 1M rows/s between Python and PostgreSQL in 2016.These days they advertise 2M/s.
Even if you stick to
psycopg3
orpsycopg2
I'm guessing you're using now, make sure you know there are multiple ways to do that:14'000'000
records in 2022 is38356
a day, and if loading that takes 1-4 minutes, you're loading160
-640
rows a second (half that if you meant that's how much you had for both 2022 and 2023). That sounds like you picked the slowest, row-by-row insert method - that score sort of matches test results in the link above. The reason you're slightly below that could mean each of your rows is heavier but also that you're using string interpolation instead of parameter binding, slowing things down further. Other, although less likely reasons these days, would be thatdescription
,"labTestGroups"
andprices
values are extremely longLoading directly into a table will always be faster than routing records to multiple tables through the partitioning mechanism while maintaining indexes, but looking at your score I'd say you should be able to gain way more performance client-side.