Data Hero 4 U: Enable Distributed Data Processing for Cassandra With Spark

Wednesday, October 04, 2017

Enable Distributed Data Processing for Cassandra With Spark

DZone Database Zone

Enable Distributed Data Processing for Cassandra With Spark

Cassandra is a distributed database system that offers linear scale performance with high availability over a cluster of commodity servers. A distributed data model or data partitioning is the primary technique in Cassandra as many others distributed storage systems to achieve scalable performance and fault tolerance. In Hadoop, the distributed data model of HDFS brings another value: MapReduce, a distributed programming model, which allows parallel data processing on its data partitions (data blocks). In Hadoop ecosystem, it's commonly known as, "bring computation closer to data."

Is it possible a similar data processing model with Cassandra? That is, take advantage of the distributed nature in Cassandra and apply any data processing logic parallelly on each data partitions? Yes, that's possible with Datastax Spark-Cassandra Connector, which provides the RDD abstraction for data collections in Cassandra.

more info...

Data Hero 4 U

Wednesday, October 04, 2017

Enable Distributed Data Processing for Cassandra With Spark

No comments:

Fun With SQL: Functions in Postgres

Report Abuse