I have a problem that I just can't solve on my own. There is a Python script that looks something like:
from threading import Thread
import threading
import pymssql
lock = threading.Lock()
global_list = []
max_workers = 1
workers_online =0
def get_query(my_query,var1,list1..):
global workers_online,global_list
connection = pymssql.connect(params)
cursor = connection.cursor()
cursor.execute(my_query)
records = cursor.fetchall()
cursor.close()
#do something with records
with lock:
workers_online -= 1
global_list.apend('some data')
main():
while workers_online >=max_workers:
pass
workers_online +=1
for query in main_list:
th = Thread(target=get_query, args=(query,var1,list1.. ),daemon = False)
th.start()
#th.join()
The script sends queries to the MSSQL database using the "pymssql" library in a for-loop cycle. Each query returns approximately 10 to 1000 rows, but there are also large ones - about 100k rows. On the server side, everything works quickly, 1-10 seconds per request.
The script runs from the local machine: Windows 10 Pro x64,i5-11300H, 16 GB ram
But here's the problem.
If I don’t use threading at all(no threading library in script) or use 1 thread with th.join(commented in code), then everything works quickly.
As soon as I use more than one thread and stop using the join, that part: records = cursor.fetchall()
- begins to take a very long time to execute
on queries that return a ~10k of rows or more. If i reduce the query results with 'SELECT TOP 1000'
On the MSSQL server side, the request hangs with the status of execution = 'ASYNC_NETWORK_IO' and no memory consumption.
Seems like Insufficient recources from my, not the server side.
Of course i can uncomment #th.join() , but then there won't be any difference from script without threading at all.
I'm trying to understand why this happens. Perhaps someone has already encountered this. Is it possible to modify the existing script or use a different approach to implement this idea?