Redis module speeds Spark-powered machine learning

The newly released Redis-ML component for the popular in-memory data store accelerates machine learning functions with Apache Spark

In-memory data store Redis recently acquired a module architecture to expand functionality. The latest module is a machine learning add-on that accelerates delivery of results from trained data rather than training itself.

Redis-ML, or the Redis Module for Machine Learning, comes courtesy of the commercial outfit that drives Redis development, Redis Labs. It speeds the execution of machine learning models while still allowing those models to be trained in familiar ways. Redis works as an in-memory cache backed by disk storage, and its creators claim machine learning models can be executed orders of magnitude more quickly with it.

The module works in conjunction with Apache Spark, another in-memory data-processing tool with machine learning components. Spark handles the data-gathering phase, and Redis plugs into the Spark cluster through the pre-existing Redis Spark-ML module. The module generated by Spark's training is then saved to Redis, rather than to an Apache Parquet or HDFS data store. To execute the models, you run the queries on the Redis-ML module, not Spark itself.

In the big picture, Redis-ML offers speed: faster responses to individual queries, smaller penalties for large numbers of users making requests, and the ability to provide high availability of the results via a scale-out Redis setup. Redis Labs claims the prediction process shows "5x to 10x latency improvement over the standard Spark solution in real-time classifications."

Another boon is specifically for developers, as Redis-ML interoperates with Scala, JavaScript (via Node.js), Python, and the .Net languages. Models "are no longer restricted to the language they were developed in," states Redis Labs, but "can be accessed by applications written in different languages concurrently using the simple [Redis-ML] API." Redis Labs also claims the resulting trained model is easier to deploy, since it can be accessed through said APIs without custom code or infrastructure.

Accelerating Spark with other technologies isn't a new idea. Previously, the idea was to speed up the storage back ends that Spark talks to. In fact, Redis' engineers herald it as one such solution. Another project, Apache Arrow, speeds Spark execution (and other big data projects) by transforming data into a columnar format that can be processed more efficiently.

Redis Labs is pushing Redis as a broad solution to these problems, since its architecture (what its creators call a "structure store") permits more free-form storage than competing database solutions. Redis VP of Product Management Cihan Biyikoglu noted in a phone interview that other databases attempt to adapt data types to the problems at hand; Redis, by contrast, instead of "shackling [you] to one data model, type, or representation," allows "an abstraction that can house any type of data."

If Redis Labs has a long-term plan, it's to inch Redis toward becoming an all-in-one solution for machine learning -- to provide a data-gathering and data-querying mechanism along with the machine learning libraries under one roof. To wit: Another Redis module, for Google's TensorFlow framework, not only allows Redis to serve as backing for TensorFlow, but allows training TensorFlow models directly inside Redis.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.