I have default configuration cassandra on default installation docker on windows 11. Table data contains 19 rows.
The python driver is exceptionally slow and crashes in about 20% of cases. (Connection Timeout)
I first expected this has something to do with docker or the container configuration, but I noticed that RazorSQL has no issues and therefore I did some performance testing by comparing the official datastax python driver to the official datastax .NET driver.
The results are devastating:
- Python: 22.908 seconds (!)
- .NET: 0.168 seconds
Is this normal behavior of the python driver?
My python code:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import time
start = time.time()
for i in range(10):
auth_provider = PlainTextAuthProvider(username="cassandra", password="cassandra")
cluster=Cluster(["localhost"], auth_provider=auth_provider,connect_timeout=30)
session=cluster.connect("rds")
session.execute("SELECT COUNT(*) FROM data").one()
end = time.time()
print((end - start)/10)
My C# code:
using Cassandra;
using System;
using System.Diagnostics;
public void TestReliability()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 100; i++){Test();}
stopwatch.Stop();
Console.WriteLine("Average connect + one query in ms: " + (stopwatch.ElapsedMilliseconds / 100));
}
public void Test()
{
Cluster cluster = Cluster.Builder().AddContactPoint("localhost").WithAuthProvider(new PlainTextAuthProvider("cassandra", "cassandra")).Build();
ISession session = cluster.Connect("rds");
var result=session.Execute("SELECT COUNT(*) FROM data");
session.Dispose();
cluster.Dispose();
}
EDIT: The python driver does not crash when timeout is set high enough (35 seconds(!))
In Cassandra applications
ClusterandSessionshould be singletons as those are stateful (handling load balancing, failover) and it is expensive to open connections.Here you are opening connection over and over again in a loop. Move those 3 lines outside of the loop and should get back on your feet.