python aiomysql quering time increasing with number of parallel tasks

Question

python aiomysql quering time increasing with number of parallel tasks

71 views Asked by Alex Monik At 15 September 2024 at 11:00

I'm using python's aiomysql to query MySQL database in async apps. Also I use anyio to run parallel tasks with create_task_group. Sometimes I need to obtain data by chunks from mysql db and due to speed up quering I create tasks group and run queries in parallel, but I faced the problem that single query time increasing with number of tasks. There is a snippet to provide an example:

import os
import aiomysql
from time import time
from anyio import run, create_task_group
from dotenv import load_dotenv

diffs = []  # contains each single query time
tot_reqs = 0  # count number of total requests


async def measure(pool):
    global diffs
    global tot_reqs

    async with pool.acquire() as conn:
        start_time = time()
        while True:
            t1 = time()
            cur = await conn.cursor()
            sql = "select * from ttable limit 10000"
            await cur.execute(query=sql)
            await cur.close()

            diffs.append(time() - t1)
            tot_reqs += 1
            if len(diffs) > 9:  # print mean time of every 10 queries
                print(
                    f"[{time()-start_time:.2f}s since started] Mean query time: {sum(diffs)/len(diffs)} (total requests performed: {tot_reqs})"
                )
                diffs = []


async def case():
    db_env_prefix = "SOME_MYSQL"
    pool = await aiomysql.create_pool(
        host=os.getenv(db_env_prefix + "_HOST"),
        port=int(os.getenv(db_env_prefix + "_PORT") or 3306),
        user=os.getenv(db_env_prefix + "_USERNAME"),
        password=os.getenv(db_env_prefix + "_PASSWORD"),
        db=os.getenv(db_env_prefix + "_DB"),
        maxsize=10,
        autocommit=True,
        pool_recycle=600,
    )
    async with create_task_group() as tg:
        for _ in range(10):
            tg.start_soon(measure, pool)
    pool.close()
    await pool.wait_closed()


if __name__ == "__main__":
    load_dotenv()
    run(case)

the printed output is:

[0.83s since started] Mean query time: 0.5825932741165161 (total requests performed: 10)
[2.95s since started] Mean query time: 0.989139986038208 (total requests performed: 20)
[4.23s since started] Mean query time: 1.238171124458313 (total requests performed: 30)
[5.53s since started] Mean query time: 1.2967675924301147 (total requests performed: 40)
[6.83s since started] Mean query time: 1.2985524892807008 (total requests performed: 50)
[8.14s since started] Mean query time: 1.3030725479125977 (total requests performed: 60)
[9.44s since started] Mean query time: 1.3051365852355956 (total requests performed: 70)
[10.75s since started] Mean query time: 1.3047129154205321 (total requests performed: 80)
[12.05s since started] Mean query time: 1.3064133167266845 (total requests performed: 90)
[13.36s since started] Mean query time: 1.3030510425567627 (total requests performed: 100)

so by running 10 tasks in parallel in ~13 seconds I've got about 100 requests finished with mean time 1.3 secs by query. Then I run only one task by:

async with create_task_group() as tg:
        for _ in range(1):
            tg.start_soon(measure, pool)

I got

[1.24s since started] Mean query time: 0.12407510280609131 (total requests performed: 10)
[2.54s since started] Mean query time: 0.13026781082153321 (total requests performed: 20)
[3.85s since started] Mean query time: 0.13041894435882567 (total requests performed: 30)
[5.16s since started] Mean query time: 0.13082268238067626 (total requests performed: 40)
[6.47s since started] Mean query time: 0.13111093044281005 (total requests performed: 50)
[7.78s since started] Mean query time: 0.13118529319763184 (total requests performed: 60)
[9.09s since started] Mean query time: 0.1312186002731323 (total requests performed: 70)
[10.40s since started] Mean query time: 0.13101680278778077 (total requests performed: 80)
[11.72s since started] Mean query time: 0.13155796527862548 (total requests performed: 90)
[13.03s since started] Mean query time: 0.131210994720459 (total requests performed: 100)

the same 100 requests in ~13 secs (single task with ~x10 faster single query, mean query time is 0.13 sec). There is no any cpu-bounded tasks, only IO requests.

So, by increasing number of tasks there is no actual improvement of quering time: the more tasks the slower single query. I tried to query different tables, change mysql configs like increasing innodb_thread_concurrency, innodb_read_io_threads etc. And actually it doesnt look like problem with table/db configuration since when I run the same code in many terminals at the same time, single query time don't struggle and works as I expected from tg.start_soon approach. Multiprocessing will help but this it's has own disadvantages. And, again, its strange behaviour for IO task, nothing should block async request.

UPD: As requested in comments:

MariaDB [(none)]> SELECT 'SLEEPING MDB Ram use', COUNT(*),SUM(time),SUM(memory_used),SUM(max_memory_used) FROM information_schema.processlist WHERE command="Sleep";
+----------------------+----------+-----------+------------------+----------------------+
| SLEEPING MDB Ram use | COUNT(*) | SUM(time) | SUM(memory_used) | SUM(max_memory_used) |
+----------------------+----------+-----------+------------------+----------------------+
| SLEEPING MDB Ram use |        1 |       309 |            79632 |               162472 |
+----------------------+----------+-----------+------------------+----------------------+

Original Q&A

There are 1 answers

**Wilson Hauck** · Answer 1 · 2024-03-12 19:18:55

Suggestions to consider for your my.cnf [mysqld] section

thread_pool_size=6  # from 8 - suggestion in doc is 80% of your 8 cores max
table_open_cache_instances=1  # from 8 until you have more than 1,000 tables
net_buffer_length=98304  # from 16384 to reduce packet in/out count
innodb_io_capacity=900  # from 200 to user move of your NVME IOPS
tmp_table_size=33554532  # from 16M to expand capacity
max_heap_table_size=33554532  # from 16M to reduce created_tmp_disk_tables

There are more opportunities to improve performance. Please view my profile.

Expecting your queries to complete quicker. When streaming a single set of queries through one connection you may not achieve as much parallel processing as you hoped for. Please post new times to complete after implementation, if time permits.

TechQA.

python aiomysql quering time increasing with number of parallel tasks

There are 1 answers

Related Questions in PYTHON

Related Questions in ASYNCHRONOUS

Related Questions in AIO-MYSQL

Related Questions in ANYIO

Popular Questions

Popular Tags

Trending Questions