Databricks, the Data and AI company, announced it has agreed to acquire Tabular, a data management company founded by Ryan Blue, Daniel Weeks, and Jason Reid. By bringing together the original creators of Apache Iceberg and Linux Foundation Delta Lake, the two leading open source lakehouse formats, Databricks will lead the way with data compatibility so that organizations are no longer limited by which of these formats their data is in. Databricks intends to work closely with the Delta Lake and Iceberg communities to bring format compatibility to the lakehouse; in the short term, inside Delta Lake UniForm and in the long term, by evolving toward a single, open, and common standard of interoperability. Databricks and Tabular will work together towards a joint vision of the open lakehouse.
The Rise of Lakehouse Architecture and Format Incompatibility
Databricks pioneered the lakehouse architecture in 2020 to enable the integration of traditional data warehousing workloads with AI workloads on a single, governed copy of data. For this to work, all data has to be in an open format so different workloads, applications, and engines could access the same data. Lakehouse architecture maximizes enterprise productivity by democratizing access to data. This is in contrast to proprietary data warehouses where only a proprietary SQL engine can read, write or share the data, and data often has to be copied and exported to be used by other applications, creating a high degree of vendor lock-in. Four years later, 74% of enterprises have deployed a lakehouse architecture.
The foundation of the lakehouse is open source data formats that enable ACID transactions on data stored in object storage. These formats dramatically improve the reliability and performance of data operations on the data lake and were specifically designed for open source engines such as Apache Spark, Trino and Presto. To address these challenges, Databricks worked with the Linux Foundation to create the Delta Lake project. Since its inception, Delta Lake has over 500 code contributors from a diverse set of organizations, and over 10,000 companies globally use Delta Lake to process 4+ exabytes of data on average each day.
Also Read: Qlik Signs Strategic Collaboration Agreement with AWS to Accelerate Enterprise AI Adoption
Around the same time Delta Lake was created, Ryan Blue and Daniel Weeks developed the Iceberg project at Netflix and donated it to the Apache Software Foundation. Since then, Delta Lake and Iceberg have emerged as the two leading open source standards for lakehouse formats. Even though both of these formats are based on Apache Parquet and share similar goals and designs, they became incompatible due to independent development.
Over time a number of other open source and proprietary engines have adopted these formats. However, they usually adopted only one of the standards and more often than not, only part of that standard, leading to fragmented and siloed enterprise data, undermining the value of the lakehouse architecture.
SOURCE: PRNewsWire