Global Big Data Conference

Industry News Details

Machine learning: From science project to business plan Posted on : Jan 02 - 2017

In 2016, machine learning became more widespread, powered by better hardware, and democratized through as-a-service APIs

2015 was the year machine learning emerged from the academic closet. No longer was it an esoteric discipline commanded by the few, the proud, the data scientists. Now it was, in theory, everyone’s business.

2016 was the year theory became practice. Machine learning’s power and promise, and all that surrounded and supported it, moved more firmly into the enterprise development mainstream.

[ Give yourself a technology career advantage with InfoWorld’s Deep Dive technology reports and Computerworld’s career trends reports. GET A 15% DISCOUNT through Jan.15, 2017: Use code 8TIISZ4Z. | Cut to the key news in technology trends and IT breakthroughs with the InfoWorld Daily newsletter, our summary of the top tech happenings. ]

That movement revolved around three trends: new and improved tool kits for machine learning, better hardware (and easier access to it), and more cloud-hosted, as-a-service variants of machine learning that provided both open source and proprietary tools.

1. New, revamped tool kits and frameworks helped lighten the load

Once upon a time, if you wanted to implement machine learning in an app, you had to roll the algorithms yourself. Eventually, third-party libraries came onto the field that saved you the trouble of reinventing the wheel, but still required a lot of heavy lifting to be productive. Now the state of the art involves frameworks designed to make machine learning an assembly-line process: Data in one end, trained models and useful results out the other.

What better way to implement such items than through existing data-processing frameworks? To that end, superhot (and superfast) data framework Spark not only kicked up its performance with its 2.0 release, but added a revised machine learning library that better complements Spark’s new internal architecture.

Another trend in the same vein: Products that handled data, but previously didn’t have a direct connection to machine learning, started offering machine learning acceleration as a new feature. Redis, the in-memory data caching system that pulls double duty as a database, added Spark-powered machine learning as one application for its new modular architecture.

A third trend in the field is the rise of new support tools for developing machine learning software. Sometimes it’s an entirely new language; for example, Lift was created for writing high-speed, parallel algorithms that run well on CPUs, GPUs, and other hardware. Other times it’s tool kits for existing languages; to wit, Milk enhances C/C++ applications that use the OpenMP tool set, speeding up access to big data sets.

2. GPUs and custom hardware arrived in force in the cloud—and everywhere else

Machine learning wasn’t made possible by the blisteringly fast computational power that GPUs provide, but GPUs certainly provide a performance boost that current-generation CPUs can’t even begin to approach.

To that end, two big movements in machine learning in 2016 involved GPUs. First was the accelerating use of GPUs in machine learning products, and not only in frameworks like Spark. GPU speedups also started getting more notice in database applications, especially those marketed as methods to feed data-hungry machine learning systems.

The other big GPU-related change was that every single major cloud vendor could now boast of having GPU-accelerated instances as part of its product lineup. With cloud-hosted GPUs, customers can buy enough of the processing power they need for a machine learning training job and get it at a scale that would be prohibitively difficult if they set up their own GPU-powered machine learning rigs in-house.

Amazon already had GPU-powered instances to its name, but added a more flexible approach: You could add or remove GPU processing from instances, rather than having to buy an instance with GPU processing as part of the package. Google, however, provided attach/detach functionality right out of the gate when it rolled out its first GPU-powered instances.

Microsoft Azure already offered GPUs as part of its cloud product lineup, but hinted at intriguing future directions for user-programmable hardware in its datacenters. FPGAs, a class of high-speed programmable hardware, are currently in use to speed up networking in Azure, but Microsoft’s long-term plan is to offer access to similar devices to juice computation-intensive apps like—you guessed it—machine learning. (Amazon is also cooking up similar plans.)

One possible drawback to GPUs in the cloud: You don’t always get cutting-edge hardware at your disposal. When Amazon added new GPU instance types in September, it stuck with previous-generation GPU hardware, most likely to offer a well-understood product rather than a newer but less familiar one.

3. Cloud-hosted algorithms democratized machine learning, but at a cost

“Democratizing AI”—those were the words Microsoft used to describe its mission to bring machine learning resources to everyone through the cloud. It’s not a bad way to sum up what the other cloud giants aimed for as well: Provide tools for creating intelligent software that are as painless to leverage as any other API.

“Artificial intelligence as a service” is another way to put it. As with other as-a-service offerings, the cloud does the heavy lifting—not only the provisioning of the systems, but the training of the models, and the hosting of the data used for the training. If you don’t already have the data in the cloud, new solutions abound for getting it there, like Amazon’s 100-petabyes-in-a-truck “Snowmobile.”

In many cases, you can skip the training and go right to a prefab API for everything you need. Such APIs emphasize convenience over transparency of function: requests in, intelligence out. For many people, that’s ideal, since it minimizes the amount of work involved. But it also means the mechanisms used to produce the answers generated are more opaque.

To get around this, you can depend on algorithms and mechanisms that are cloud-hosted versions of existing, familiar tools. Spark is one such tool, hosted both by its creators (Databricks) and by third parties like IBM and Microsoft in their respective clouds.

The plus side of this arrangement: You get to pick the route that’s best suited to your needs. The black-box API route should provide enough useful results for those who don’t require more than the commodified version of machine learning. But those in the business of pushing an envelope will likely want to roll their own machine-learning-powered solutions—and the bar is set to be raised further for both paths in the coming year. Source

Get the