Why is my Python app slow connecting to Cassandra compared to C#?

187 views Asked by At

I have default configuration cassandra on default installation docker on windows 11. Table data contains 19 rows.

The python driver is exceptionally slow and crashes in about 20% of cases. (Connection Timeout)

I first expected this has something to do with docker or the container configuration, but I noticed that RazorSQL has no issues and therefore I did some performance testing by comparing the official datastax python driver to the official datastax .NET driver.

The results are devastating:

  • Python: 22.908 seconds (!)
  • .NET: 0.168 seconds

Is this normal behavior of the python driver?

My python code:

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import time
start = time.time()
for i in range(10):
    auth_provider = PlainTextAuthProvider(username="cassandra", password="cassandra")
    cluster=Cluster(["localhost"], auth_provider=auth_provider,connect_timeout=30)
    session=cluster.connect("rds")
    session.execute("SELECT COUNT(*) FROM data").one()
end = time.time()
print((end - start)/10)

My C# code:

using Cassandra;
using System;
using System.Diagnostics;
public void TestReliability()
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    for (int i = 0; i < 100; i++){Test();}
    stopwatch.Stop();
    Console.WriteLine("Average connect + one query in ms: " + (stopwatch.ElapsedMilliseconds / 100));
}
public void Test()
{
    Cluster cluster = Cluster.Builder().AddContactPoint("localhost").WithAuthProvider(new PlainTextAuthProvider("cassandra", "cassandra")).Build();
    ISession session = cluster.Connect("rds");
    var result=session.Execute("SELECT COUNT(*) FROM data");
    session.Dispose();
    cluster.Dispose();
}

EDIT: The python driver does not crash when timeout is set high enough (35 seconds(!))

2

There are 2 answers

1
clunven On

In Cassandra applications Cluster and Session should be singletons as those are stateful (handling load balancing, failover) and it is expensive to open connections.

Here you are opening connection over and over again in a loop. Move those 3 lines outside of the loop and should get back on your feet.

auth_provider = PlainTextAuthProvider(username="cassandra", password="cassandra")
cluster=Cluster(["localhost"], auth_provider=auth_provider,connect_timeout=30)
session=cluster.connect("rds")
1
Erick Ramirez On

Your test looks invalid to me (more on this later). You're breaking the usage guidelines, mainly (1) use a single cluster object, and (2) use a single session object for the life of the application because (3) maintaining multiple instances are expensive.

But specifically on the sample code you posted, you are not comparing apples-for-apples.

In your C# code, you are making explicit calls to dispose():

    session.Dispose();
    cluster.Dispose();

which close all connections and perform a cleanup of resources. However you are not doing the same thing in your Python code which means that the older connections (and associated resources) are still maintained by the app in the background.

To make your two sample codes more comparable, you should call Session.shutdown() and Cluster.shutdown(). For more info, see the cassandra.cluster API Doc for the Python driver.

In any case, your test isn't valid because it isn't how applications behave in real life. If you tell us what problem you're trying to solve or what you're trying to achieve, we would be able to provide a better answer.

If you are interested, I recommend having a look at Best practices for Cassandra drivers. Cheers!