TNS
VOXPOP
Terraform or Bust?
Has your organization used or evaluated a Terraform alternative since new restrictions were placed on its licensing?
We have used Terraform, but are now piloting or using an open source alternative like OpenTofu.
0%
We never used Terraform, but have recently piloted or used alternatives.
0%
We don't use Terraform and don't plan to use or evaluate alternatives.
0%
We use Terraform and are satisfied with the results
0%
We are waiting to see what IBM will do with Terraform.
0%

Spark Closes in on Real-Time Processing with Redis Pairing

Feb 2nd, 2016 11:00am by
Featued image for: Spark Closes in on Real-Time Processing with Redis Pairing

Redis Labs has released a connector that would allow the Spark data processing platform to use the Redis in-memory data store.

Using Redis for Spark will allow users to “store a huge amount of data without paying a significant amount of money for infrastructure,” explained Yiftach Shoolman, co-founder and Chief Technology Officer of Redis Labs, noting that Redis can be a lower cost alternative to a full-fledged in-memory database system. “Today we want the big data performance to be as close to real-time as possible. That is what we try to do.”

Specifically, the open source Spark-Redis connector package provides an easy way to run SparkSQL queries against data stored on Redis.

Running Spark against a Redis data store can speed processing by 135 times, compared to using HDFS (Hadoop File System) and is even 45 times faster than using the Tachyon in-memory data store, according to benchmarks from Redis Labs.

Redis

Redis Labs is eager to make Redis the de-facto data store for Spark, Shoolman asserted.

The package is a library that provides a library for writing to and reading from a Redis cluster. It exposes all of Redis’ data structures – string, hash, list, set, sorted set, bitmaps, hyperloglogs – as Spark RDDs (Resilient Data Sets)  or through the Spark DataSet API.

The library minimizes the overhead that occurs with serialization and deserialization of large amounts of data.

Spark itself has emerged as the chief successor to the Hadoop data processing platform thanks in no small part to an ability to process data in near-real time, rather than the batch processing of ‘big data’ that Hadoop originally offered.

“Apache Spark is becoming a default in-memory engine for high-performance data integration and analytics,” said Matt Aslett, research director, data platforms and analytics at 451 Research, in a statement. “The combination of Redis and Spark should enable high-performance, real-time analytics with extremely large and variable datasets.”

Group Created with Sketch.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.