Cassandra exception under load

918 views Asked by At

We are using Cassandra in production with these following configuration:

  • 16 core CPU
  • 56 GB Memory
  • 4 Nodes
  • Secondary disk type: SSD

Drivers version:

- Spring data cassandra : 1.5.1
- Spring data cassandra contain cassandra driver core : 3.1.4
- Native protocol : v4

Cassandra version: - DSE 5.0.5 which uses Cassandra 3.0.11.1485

CPU consuption:

  • Node 1 : 80%
  • Node 2 : 30%
  • Node 3 : 20%
  • Node 4 : 40%

Cassandra connection configurations are following:

spring.data.cassandra.cluster-name= Test Cluster
spring.data.cassandra.compression= none
spring.data.cassandra.connect-timeout-millis= 5000
spring.data.cassandra.keyspace-name= event
spring.data.cassandra.contact-points=host1,host2,host3,host4
spring.data.cassandra.port= 9042
spring.data.cassandra.max.request.per.connection.local=32768
spring.data.cassandra.max.request.per.connection.remote=2000
spring.data.cassandra.max-core-connection.local=8
spring.data.cassandra.max-core-connection.remote=2

Cassandra configuration class:

@Configuration
public class CassandraConfiguration {

    @Value("${spring.data.cassandra.cluster-name}")
    public String clusterName;

    @Value("${spring.data.cassandra.keyspace-name}")
    public String keySpace;

    @Value("${spring.data.cassandra.contact-points}")
    private String contactpoints;

    @Value("${spring.data.cassandra.port}")
    public int port;

    @Value("${spring.data.cassandra.max.request.per.connection.local}")
    public int localRequestConnection;
    @Value("${spring.data.cassandra.max.request.per.connection.remote}")
    public int remoteRequestConnection;

    @Value("${spring.data.cassandra.max.connection.local}")
    public int localConnection;
    @Value("${spring.data.cassandra.max.connection.remote}")
    public int remoteConnection;

    @Value("${spring.data.cassandra.connect-timeout-millis}")
    public int poolTimeoutMillis;

    @Bean
    public CassandraClusterFactoryBean cluster() {
        CassandraClusterFactoryBean cluster = new CassandraClusterFactoryBean();
        cluster.setContactPoints(contactpoints);
        cluster.setPort(port);
        cluster.setClusterName(clusterName);
        cluster.setCompressionType(CompressionType.NONE);
        cluster.setPoolingOptions(getCassandraPool());

        return cluster;
    }

    @Bean
    public PoolingOptions getCassandraPool() {
        PoolingOptions poolingOptions = new PoolingOptions(); 
        poolingOptions
        .setMaxRequestsPerConnection(HostDistance.LOCAL, localRequestConnection)
        .setMaxRequestsPerConnection(HostDistance.REMOTE, remoteRequestConnection)
        .setCoreConnectionsPerHost(HostDistance.LOCAL, 8)
        .setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
        .setMaxConnectionsPerHost(HostDistance.LOCAL, 8)
        .setMaxConnectionsPerHost(HostDistance.REMOTE, 2)
        .setPoolTimeoutMillis(poolTimeoutMillis);
        return poolingOptions;
    }

    @Bean
    public CassandraMappingContext mappingContext() {
        return new BasicCassandraMappingContext();
    }

    @Bean
    public CassandraConverter converter() {
        return new MappingCassandraConverter(mappingContext());
    }

    @Bean
    public CassandraSessionFactoryBean session() throws Exception {
        CassandraSessionFactoryBean session = new CassandraSessionFactoryBean();
        session.setCluster(cluster().getObject());
        session.setKeyspaceName(keySpace);
        session.setConverter(converter());
        return session;
    }

    @Bean
    public CassandraOperations cassandraTemplate() throws Exception {
        return new CassandraTemplate(session().getObject());
    }
}

What problem we are facing is:

During normal traffic, everything goes well, but as soon as traffic goes high we get these below error from Cassandra driver:

com.datastax.driver.core.exceptions.OperationTimedOutException: [/105.12.14.114:9042] Timed out waiting for server response
        at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
        at com.datastax.driver.core.DefaultResultSetFuture.onException(DefaultResultSetFuture.java:202)
        at com.datastax.driver.core.RequestHandler.setFinalException(RequestHandler.java:201)
        at com.datastax.driver.core.RequestHandler.access$2400(RequestHandler.java:46)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalException(RequestHandler.java:795)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.processRetryDecision(RequestHandler.java:421)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:778)
        at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:1374)
        at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)
        at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)
        at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/105.12.14.114:9042] Timed out waiting for server response
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onTimeout(RequestHandler.java:772)
        ... 6 more

Edit 1 :

I have around 60 tables similar to below table which is used to accomplish different use-cases

Cassandra table.

create table IF NOT EXISTS kspace.count_table (source_id bigint, name varchar, date text, pname varchar, ptype varchar, pvalue blob, count counter,unique_count counter, PRIMARY KEY((source_id,name,pname,ptype,date),pvalue))

Cassandra queries:

I have different varieties of queries for our different use-case. Below are example table queries which resemble our original queries:

  • Single query (around 12)
  • Batch queries belonging to the same partition key (around 30 queries)
  • Batch queries belonging to the different partition key (around 20 queries)

Edit 2 : We are facing this problem after migrating from Apache Cassandra 2.1.8 to DSE version 5.0.5

0

There are 0 answers