What is the difference between using spark Hive and any other Spark with NoSQL or SQL database?

66 views Asked by At

I am new to Spark. I had been trying to use Spark Hive, Spark MySQL or Spark Cassandra. However, i still don't know the differences between them, which is slower, which is more expensive and what are their disadvantages, how they acctually work.

Can anyone here help me figure out the differences between them and if possible, i also want some examples please!

Thank you everyone!

1

There are 1 answers

2
Erick Ramirez On

To connect to a Cassandra database from a Spark application, you need to use the Spark Cassandra connector library. I am not aware of alternative options that would allow you to connect to Cassandra otherwise.

Here's an example that shows how to use the connector from a Spark 3.2 cluster with spark-shell:

$ spark-shell
  --packages com.datastax.spark:spark-cassandra-connector_2.12:3.2.0
  --master <master_url>
  --conf spark.cassandra.connection.host=cass_ip
  --conf spark.cassandra.auth.username=cass_user
  --conf spark.cassandra.auth.password=cass_pass
  --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions

Here's an example code that you can run in a Spark shell to count the the number of keyspaces:

val rdd = sc.cassandraTable("system_schema","keyspaces")
println("Row count: " + rdd.count)

Please support the Apache Cassandra community by hovering over the tag then click on the Watch tag button. Thanks!