How to fix problem "Unable to complete the operation against any hosts" in Cassandra?

365 views Asked by At

I have a pretty simple AWS Lambda function in which I connect to an Amazon Keyspaces for Cassandra database. This code in Python works, but from time to time I get the error. How do I fix this strange behavior? I have an assumption that you need to make additional settings when initializing the cluster. For example, set_max_connections_per_host. I would appreciate any help.

ERROR:

('Unable to complete the operation against any hosts', {<Host: X.XXX.XX.XXX:XXXX eu-central-1>: ConnectionShutdown('Connection to X.XXX.XX.XXX:XXXX was closed')})

lambda_function.py:

import sessions


cassandra_db_session = None
cassandra_db_username = 'your-username'
cassandra_db_password = 'your-password'
cassandra_db_endpoints = ['your-endpoint']
cassandra_db_port = 9142


def lambda_handler(event, context):
    global cassandra_db_session
    if not cassandra_db_session:
        cassandra_db_session = sessions.create_cassandra_session(
            cassandra_db_username,
            cassandra_db_password,
            cassandra_db_endpoints,
            cassandra_db_port
        )
    result = cassandra_db_session.execute('select * from "your-keyspace"."your-table";')
    return 'ok'

sessions.py:

from ssl import SSLContext
from ssl import CERT_REQUIRED
from ssl import PROTOCOL_TLSv1_2
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.policies import DCAwareRoundRobinPolicy


def create_cassandra_session(db_username, db_password, db_endpoints, db_port):
    ssl_context = SSLContext(PROTOCOL_TLSv1_2)
    ssl_context.load_verify_locations('your-path/AmazonRootCA1.pem')
    ssl_context.verify_mode = CERT_REQUIRED
    auth_provider = PlainTextAuthProvider(username=db_username, password=db_password)
    cluster = Cluster(
        db_endpoints,
        ssl_context=ssl_context,
        auth_provider=auth_provider,
        port=db_port,
        load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='eu-central-1'),
        protocol_version=4,
        connect_timeout=60
    )
    session = cluster.connect()
    return session
2

There are 2 answers

3
Erick Ramirez On

There isn't much point setting the max connections on the client side since AWS Lambdas are effectively "dead" between runs. For the same reason, the recommendation is to disable driver heartbeats (with idle_heartbeat_interval = 0) since there is no activity that occurs until the next time the function is called.

This doesn't necessarily cause the issue you are seeing but there's a good chance the connection is being reused by the driver after it has been closed server-side.

With the lack of public documentation on the inner-workings of AWS Keyspaces, it's difficult to know what is happening on the cluster. I've always suspected that AWS Keyspaces has a CQL-like API engine in front of a Dynamo DB so there are quirks like what you're seeing that are hard to track down since it requires knowledge only available internally at AWS.

FWIW the DataStax drivers aren't tested against AWS Keyspaces.

1
Aaron On

This is the biggest issue which I see:

result = cassandra_db_session.execute('select * from "your-keyspace"."your-table";')

The code looks fine, but I don't see a WHERE clause. So if there's a lot of data, a single node (chosen as a coordinator) will have to build the result set while pulling data from all other nodes. As this results in (un)predictibly bad performance, that could explain why it works sometimes, but not others.

Pro-tip: All queries in Cassandra should have a WHERE clause.