Data virtualization: The missing link in the devops tool chain

How copy data virtualization can help streamline development and testing, while also reducing storage costs

Because applications are the lifeblood of the modern business, there’s a heavy burden on developers to provide modern, integrated, high-quality applications -- and to do so in a compressed timeframe. Developers need to iterate quickly with new features, functions, and competitive benefits, often compared with Web-scale competitors. By automating testing, deployment, and configuration tasks, devops tools lay the groundwork for higher quality and faster time to market. However, there’s one essential cog in the development process that most devops tools ignore: the data.

Today, one of the largest obstacles, and biggest delays, facing application development organizations is access to the data. I hear over and over from customers that gaining access to a single database copy can entail multiple steps and weeks of waiting. It’s even more difficult when application capacities are measured in terabytes and petabytes. Gaining access to large data sets typically requires multiple teams to provision storage, network, and OS before the app developer can get at it. It’s a technical hurdle, a security issue, and a business problem all rolled into one.

While the process starts with developers requesting access to the data, devops can’t begin until that data is in place. Sources of the data will vary, and the process ends up touching virtual and physical platforms, databases, hypervisors, and operating systems. If storage and compute resources and security permissions are not already in place, the time stretches longer and project costs increase. If the database is very large, the time extends again -- you may be looking at additional infrastructure costs as well.

Critical business applications usually contain sensitive data such as personally identifiable information and protected health information. Sensitive data often means intensive manual work for data masking in order to protect customer privacy, assure security, and frankly reduce legal and PR risks to the business. Assured data control requires a hard line between production and development environments. A fully integrated workflow is needed to guarantee that only the correct, or masked, data is accessible. If masking is skipped or ignored, there can be no confidence that data will be protected from exposure to hacking or potentially harmful leaks.

For any significant development project, developers need the latest, full copies of production data -- and IT organizations need a fast, efficient, and secure way to provide it to them. In short, we need to do for data what we’ve already done for all of the other components of the development environment. We need a data virtualization platform.

Virtualizing data for devops

Data virtualization is the next, natural step following server and network virtualization. When you can spin up a virtual copy at any time, without extra storage, think about what it could do for development, testing, release, and refresh of applications. Such a data virtualization platform can automate workflows, enable on-demand data access, and deliver provisioning times measured in minutes instead of days. Also, because you’re dealing with virtual, not full physical copies, data virtualization can reduce required storage capacity for development and testing. To protect sensitive data and comply with regulations, virtual data processes can incorporate automated data masking tools or scripting to eliminate expensive, time-consuming manual process.

Scalability, consistency, data control, and ease of use are all fundamental needs when incorporating automation technologies into an existing environment. Automated workflows collapse multiple, disparate, time-consuming steps into single-click automated tasks, creating the basis for self-service automation and reducing reliance on scripts and human factors. Orchestration of automated tasks enables rapid manipulation and provisioning of data for development, testing, QA, preproduction, analytics, and dozens of other use cases, each with unique requirements. The impact is to substantially improve quality and accelerate application development and release cycles.

Data virtualization means application development can start sooner and move faster. Self-service data access becomes viable. Beyond support for application development, test, and preproduction operations, virtual data platforms may provide capabilities that support the ops side of devops, including backup, disaster recovery, analytics, archive, and other needs. Reduced bandwidth (you’re not pushing full copies across your LAN), storage (you’re making virtual, not full, copies), and licensing costs can translate into considerable infrastructure savings.

Virtualized data also presents an opportunity to improve enterprise information lifecycles. Faster development means faster time to market. Virtualized data creates better access with better protection, control, and cost. It can also enable application retirement with a unique twist that maintains the relationship of data to applications.

actifio data virtualization platform — An example of a data virtualization platform workflow with masking.

How Actifio data virtualization works

Our technology is sophisticated, but at its heart is a “virtual data pipeline” composed of three simple steps: capture, manage, and use application data for whatever development or other processes your business needs.

First, capture: Actifio captures data from production applications such as Oracle, Exchange, and SAP. It captures data at the block level and in the native application format so that it can be recovered instantly (instead of having to “translate” data from a backup system’s format before use). How the data is captured is governed by administrator-defined SLAs that can be modified with a few clicks.

Second, manage: After an application’s data is captured, Actifio creates a “golden master copy,” which is a single, physical copy compatible with any storage infrastructure. Data moves once to the golden master, and based on an application’s SLA, it is then updated incrementally forever with changed blocks from the source production applications. Merging in only changed blocks makes efficient use of bandwidth and storage capacity. According to the predefined SLA, the application data is moved within the system to appropriate resources, retained, and expired.

Third, use: Once established, the golden master or an automatically refreshed “live clone” can supply a virtual copy of any application data from any point in time and for any authorized use, near instantly. It means eliminating the entire disparate and proprietary vendor infrastructure previously devoted to tasks like test data management, backup, DR, replication, or deduplication, so users can easily access and manage the critical data they need, when they need it.

That’s it -- copy data virtualization.

Data virtualization platform requirements

When evaluating a potential data virtualization platform, key customer considerations always include scalability and performance. Customers want to know if the data virtualization platform can handle terabytes to petabytes. They want performance assurance without any impact on production. A big factor is the multiple operating system, database, physical, and virtual systems support. They see that this is a management platform for data movement, migration, and cloud integration with straightforward control tools. Beyond all of these, they see the opportunity to support new development, ongoing application management, and eventual retirement through simple orchestration tools.

Other key functionality of a data virtualization platform includes:

Controls that enable self-service data access without DBA involvement
Sensitive data protection controls including data masking and role-based access
Consistency groups that coordinate volumes, applications, and data
Coordination of database and log files to roll logs prior to mount
Automated refresh of data
Workflows to automate processes
Near-instant creation of multiple copies with minimal storage consumption
Efficient protection for dev & QA test data sets
Masked in-cloud compute testing

Data virtualization platforms that meet these requirements, such as Actifio, help to fulfill the devops promise by eliminating data provisioning as a bottleneck to development and testing, while at the same time reducing the amount of compute and storage resources required to deliver full copies of application data to developers. Developers gain self-service and greater control over their timelines, while the business reaps the benefits of more rapid development, faster updates and bug fixes, and more frequently refreshed deployments.

Data virtualization offers ROI potential as well. By decoupling application data from infrastructure, data virtualization platforms can deliver backup, business continuity, and immediate data access not only for test and development, but for business intelligence, advanced analytics, and other use cases. At the same time, it eliminates redundant data copies, consolidates overlapping storage services, and centralizes the basic functions of copy, store, move, and optimize for all applications.

The result is less data moving across networks, less data to store, greater efficiency in long-term retention, substantially reduced storage costs, and the elimination of costly operational complexity. For many reasons, data virtualization is a technology whose time has finally come.

David Chang is technical co-founder and senior vice president of solutions development at Actifio.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Next read this: