The defining feature of data virtualization is that the data used remains in its original locations and real-time access is established to allow analytics across multiple sources. This aids in resolving some technical difficulties such as compatibility problems when combining data from various platforms, lowering the risk of error caused by faulty data, and guaranteeing that the newest data is used. Furthermore, avoiding the creation of a new database containing personal information can make it easier to comply with privacy regulations. As a result, data virtualization creates new possibilities for data use.4
Building on this, data virtualization's real value, particularly for users, is its declarative approach. Unlike traditional data integration methods that require specifying every step of integration, this approach can be less error-prone and more efficient. Traditional methods are tedious, especially when adapting to changing requirements, involving changes at multiple steps. Data virtualization, in contrast, allows users to simply describe the desired outcome. The software then automatically generates the necessary steps to achieve this result. If the desired outcome changes, updating the description suffices, and the software adjusts the intermediate steps accordingly. This flexibility can accelerate processes by up to five times, underscoring the primary advantage of data virtualization.5
However, with data virtualization, the connection to all necessary data sources must be operational as there is no local copy of the data, which is one of the main drawbacks of the approach. Connection problems occur more often in complex systems where one or more crucial sources will occasionally be unavailable. Smart data buffering, such as keeping the data from the most recent few requests in the virtualization system buffer can help to mitigate this issue.6
Moreover, because data virtualization solutions may use large numbers of network connections to read the original data and server virtualised tables to other solutions over the network, system security requires more consideration than it does with traditional data lakes. In a conventional data lake system, data can be imported into the lake by following specific procedures in a single environment. When using a virtualization system, the environment must separately establish secure connections with each data source, which is typically located in a different environment from the virtualization system itself.7
Security of personal data and compliance with regulations can be a major issue when introducing new services or attempting to combine various data sources. When data is delivered for analysis, data virtualisation can help to resolve privacy-related problems. Virtualization makes it possible to combine personal data from different sources without physically copying them to another location while also limiting the view to all other collected variables. However, virtualization does not eliminate the requirement to confirm the security and privacy of the analysis results before making them more widely available. Regardless of the chosen data integration method, all results based on personal level data should be protected with the appropriate privacy requirements.8
Some enterprise landscapes are filled with disparate data sources including multiple data warehouses, data marts, and/or data lakes, even though a Data Warehouse, if implemented correctly, should be unique and a single source of truth. Data virtualization can efficiently bridge data across data warehouses, data marts, and data lakes without having to create a whole new integrated physical data platform. Existing data infrastructure can continue performing their core functions while the data virtualization layer just leverages the data from those sources. This aspect of data virtualization makes it complementary to all existing data sources and increases the availability and usage of enterprise data.
Data virtualization may also be considered as an alternative to ETL and data warehousing but for performance considerations it's not really recommended for a very large data warehouse. Data virtualization is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project with extensive ETL and data storage. However, data virtualization may be extended and adapted to serve data warehousing requirements also. This will require an understanding of the data storage and history requirements along with planning and design to incorporate the right type of data virtualization, integration, and storage strategies, and infrastructure/performance optimizations (e.g., streaming, in-memory, hybrid storage).
Data Virtualization software provides some or all of the following capabilities:12
Data virtualization software may include functions for development, operation, and/or management.
A metadata engine collects, stores and analyzes information about data and metadata (data about data) in use within a domain.13
Benefits include:
Drawbacks include:
Avoid usage:
Enterprise information integration (EII) (first coined by Metamatrix), now known as Red Hat JBoss Data Virtualization, and federated database systems are terms used by some vendors to describe a core element of data virtualization: the capability to create relational JOINs in a federated VIEW.
Some data virtualization solutions and vendors:
Another more up-to-date list with user rankings is compiled by Gartner.37
"What is Data Virtualization?", Margaret Rouse, TechTarget.com, retrieved 19 August 2013 http://searchdatamanagement.techtarget.com/definition/data-virtualization ↩
"Streamlining Customer Data". Archived from the original on 2017-03-24. Retrieved 2017-03-24. https://web.archive.org/web/20170324174346/http://www.ardentisys.com/stories/streamlining-customer-data ↩
"Data virtualisation on rise as ETL alternative for data integration" Gareth Morgan, Computer Weekly, retrieved 19 August 2013 http://www.computerweekly.com/feature/Data-virtualisation-on-rise-as-ETL-alternative-for-data-integration ↩
Paiho, Satu; Tuominen, Pekka; Rökman, Jyri; Ylikerälä, Markus; Pajula, Juha; Siikavirta, Hanne (2022). "Opportunities of collected city data for smart cities". IET Smart Cities. 4 (4): 275–291. doi:10.1049/smc2.12044. S2CID 253467923. https://doi.org/10.1049%2Fsmc2.12044 ↩
"The True Value of Data Virtualization: Beyond Marketing Buzzwords", Nick Golovin, medium.com, retrieved 14 November 2023 https://medium.com/@Nick_Golovin/the-true-value-of-data-virtualization-beyond-marketing-buzzwords-7acb4e12b100 ↩
"Hammerspace - A True Global File System". Hammerspace. Retrieved 2021-10-31. https://hammerspace.com/ ↩
Summan, Jesse; Handmaker, Leslie (2022-12-20). "Data Federation vs. Data Virtualization". StreamSets. Retrieved 2024-02-08. https://streamsets.com/blog/data-federation-vs-data-virtualization/ ↩
Kendall, Aaron. "Metadata-Driven Design: Designing a Flexible Engine for API Data Retrieval". InfoQ. Retrieved 25 April 2017. https://www.infoq.com/articles/mdd-api-data-retrieval ↩
"Rapid Access to Disparate Data Across Projects Without Rework" Informatica, retrieved 19 August 2013 http://www.informatica.com/us/products/data-virtualization/data-services/ ↩
Data virtualization: 6 best practices to help the business 'get it' Joe McKendrick, ZDNet, 27 October 2011 https://www.zdnet.com/article/data-virtualization-6-best-practices-to-help-the-business-get-it/ ↩
|IT pros reveal benefits, drawbacks of data virtualization software" Mark Brunelli, SearchDataManagement, 11 October 2012 https://web.archive.org/web/20121019201702/http://searchdatamanagement.techtarget.com/news/2240165242/IT-pros-reveal-the-benefits-drawbacks-of-data-virtualization-software ↩
"The Pros and Cons of Data Virtualization" Archived 2014-08-05 at the Wayback Machine Loraine Lawson, BusinessEdge, 7 October 2011 http://www.itbusinessedge.com/cm/blogs/lawson/the-pros-and-cons-of-data-virtualization/?cs=48794 ↩
"Analyticscreator - The Ultimate Toolbox for Data Enigneers". www.analyticscreator.com. Retrieved 2024-08-27. https://www.analyticscreator.com ↩
"IBM Data Virtualization". www.ibm.com. Retrieved 2024-04-09. https://www.ibm.com/products/watson-query ↩
https://www.actifio.com/company/blog/post/enterprise-data-service-new-copy-data-virtualization/ [bare URL] https://www.actifio.com/company/blog/post/enterprise-data-service-new-copy-data-virtualization/ ↩
"Ultrawrap - Semantic Web Standards". www.w3.org. Retrieved 2024-04-09. https://www.w3.org/2001/sw/wiki/Ultrawrap ↩
"Data Virtuality - Integrate data for better-informed decisions". Data Virtuality. Retrieved 2024-04-09. https://datavirtuality.com/en/ ↩
"My Blog – My WordPress Blog". 2023-09-19. Retrieved 2024-04-09. https://datawerks.com/ ↩
"The industry leading data company for DevOps". Delphix. Retrieved 2024-04-09. https://www.delphix.com/ ↩
"Denodo is a leader in data management". Denodo. 2014-09-03. Retrieved 2024-04-09. https://www.denodo.com/en/page/denodo-home ↩
https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWJFdq [bare URL] https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWJFdq ↩
"Home". Querona Data Virtualization. Retrieved 2024-04-09. https://www.querona.io/ ↩
"Getting Started Guide Red Hat JBoss Data Virtualization 6.4 | Red Hat Customer Portal". access.redhat.com. Retrieved 2024-04-09. https://access.redhat.com/documentation/en-us/red_hat_jboss_data_virtualization/6.4/html-single/getting_started_guide/index ↩
"Stone Bond Technologies | Advanced Data Integration Platform Solution". Stone Bond Technologies. Retrieved 2024-04-09. https://stonebond.com/ ↩
"Stratio Business Semantic Data Layer delivers 99% answer accuracy for LLMs". Stratio. 2024-01-15. Retrieved 2024-04-09. https://www.stratio.com/blog/stratio-business-semantic-data-layer-delivers-99-answer-accuracy-for-llms/ ↩
"Teiid". teiid.io. Retrieved 2024-04-09. https://teiid.io/ ↩
"Managing the Veritas provisioning file system (VPFS) configuration parameters | Managing NetBackup services from the deduplication shell | Accessing NetBackup WORM storage server instances for management tasks | Managing NetBackup application instances | NetBackup™ 10.2.0.1 Application Guide | Veritas™". www.veritas.com. Retrieved 2024-04-09. https://www.veritas.com/support/en_US/doc/141196447-161587232-0/v160534095-161587232 ↩
"XAware Data Integration Project". SourceForge. 2016-04-06. Retrieved 2024-04-09. https://sourceforge.net/projects/xaware/ ↩
"Best Data Virtualization Reviews". Gartner. 2024. Retrieved 2024-02-07. https://www.gartner.com/reviews/market/data-virtualization ↩