SOLIXCloud Enterprise Data Lake, an innovative third generation cloud data platform to help customers manage, mine and monetize their enterprise data assets, is highly scalable and now delivers end-to-end data integration and data engineering solutions for both new and existing customers, in addition to the best of breed cloud data management with secure governance SOLIXCloud customers have relied upon for years.
The new data lake, scheduled to be available in August of this year, adds Apache Hudi with ACID transactions, and a unified data catalog to transform how businesses manage and analyze their growing volumes of structured and unstructured enterprise data to make better just-in-time decisions and future critical business predictions.
Along with robust governance for safety, security, compliance, and the lifecycle of data, the SOLIXCloud Enterprise Data Lake is a multi-cloud solution supporting AWS, Azure, IBM Cloud, Oracle Cloud, Google Cloud, and hybrid on-premise deployment.
“Data warehouses and first generation data lakes have not met the business requirement of storing, organizing, and analyzing enterprise data to maximize profitability,” said Dr. James Short, Lead Scientist at the San Diego Supercomputer Center. “Solix’s Enterprise Data Lake makes historical and real time data available, organized to meet the needs of data scientists, data engineers and business analysts, so they can discover new insights and build new profit-centric AI and machine learning solutions.”
Running on the cloud-native Solix Common Data Platform (CDP), SOLIXCloud Enterprise Data Lake is a third generation, transactional, streaming data lake that brings core data warehouse and database functionality directly to a data lake. Designed for high-performance, real-time cloud database workloads, the SOLIXCloud Enterprise Data Lake supports ingest of streaming data, data pipelining and delivers transactional guarantees to the data lake with consistent atomic writes and concurrency controls tailored for longer-running data lake transactions. To ensure data infrastructures are not tied to any one vendor, the SOLIXCloud Enterprise Data Lake supports Apache Hudi at customer early access, and Open Table Formats for Apache Iceberg and Delta are planned to follow.
SOLIXCloud Enterprise Data Lake is a response to the growing demand from our customers for end-to-end data fabric solutions that support serverless, low-latency transactions for intelligent enterprise applications such as generative AI, streaming analytics and machine learning operations (MLOps). The SOLIXCloud Enterprise Data Lake collects any data, including metadata, from any source, and delivers real-time data pipeline solutions with federated data governance controls including data security, consumer data privacy, compliance and Information Lifecycle Management (ILM). SOLIXCloud Enterprise Data Lake may also be added to existing SOLIXCloud solution landscapes to quickly expand data platform capabilities.
“Cloud data platforms are a cornerstone to any digital transformation strategy,” said John Ottman, Executive Chairman of Solix Technologies, Inc. “SOLIXCloud Enterprise Data Lake delivers data streaming, data governance, data integration and data engineering capabilities so customers can fully capitalize and monetize their data assets.”
Key SOLIXCloud Enterprise Data Lake database features with Open Table Formats for Apache Hudi (Hadoop-Upserts-Deletes-Incrementals) include:
ACID Transactions – The Apache Hudi data lake framework provides real-time, ACID transactional guarantees to your data lake with consistent, Atomic Writes and Isolated Reads for Concurrency controls tailored for longer-running data lake transactions. These features include Tables, Transactions, Upserts/Deletes, Advanced Indexing methods to manage and query large datasets, Clustering/Compaction, Performance Optimizations to scale writes and reads independently and optimize infrastructure, Bulk Inserts and Transactional Writes, Snapshots so readers don’t block writers and writers don’t block readers, and Time Travel to enable querying past versions of the dataset useful for audit trails or rollbacks.
Fast Data Processing – Reimagine slow, batch data processing jobs with a powerful new incremental approach of reading and writing data using Streaming Ingestion. Fast Data Processing runs alongside batch data processing and provides customers a way to re-think and re-engineer ETL processes for Hive and Spark jobs which are running too slow and taking up too many resources. Incremental data processing facilitates the processing of only new or updated data since the last batch, enhancing efficiency in data pipelines.
High-performance Loading – Even moderately big NoSQL database installations store billions of rows, making full bulk loads infeasible and a more efficient approach necessary to ingest such data volume. Replace costly and inefficient bulk loads with managed ingestion via Upserts and incremental streaming to keep your data up to date.
Additional SOLIXCloud Enterprise Data Lake features include:
Data Catalog – Data scientists and data professionals require a detailed inventory of all data assets to help quickly find the most appropriate data for any analytical or enterprise intelligence purpose. Features include role based access control, business glossary, data classification, metadata repository and data lineage.
Low-code, Incremental Data Pipelines – To create real-time, incremental data pipelines from source to target that are fit-for-use by artificial intelligence (AI), machine learning (ML) and advanced analytics, data engineers require drag and drop tools to collect data from any source and apply data cleansing, data enrichment or any other preparation. By transforming files, removing erroneous records, masking sensitive data, tagging and labeling, or combining data objects into enterprise business records, Solix Data Pipeline improves data quality and the accuracy of data warehouse, machine learning and advanced analytics applications.
A Continuous Data Delivery processing framework is ideal for low latency, minute-level analytics and change data capture workloads. Create declarative templates for incremental ingestion and transformation, and provision continuous data delivery pipelines for machine learning operations. Automate the operational burdens of scheduling, monitoring and moving enterprise data.
Change Data Capture (CDC) – Change data capture enables seamless, efficient database ingestion into your data lake. SOLIXCloud Enterprise Data Lake is designed to support fast Upserts and Deletes of data suitable for CDC and streaming use cases.
Apache Spark – Apache Spark’s parallel in-memory data processing is the world’s most widely used engine for scalable computations against structured and unstructured data. Thousands of companies, including 80% of the Fortune 500, use Apache Spark™ today.
Federated Data Governance – Federated Data Governance provides a centralized control framework for when several groups have authority over the data. Through delegated authorities, virtual policy enforcement and audit management, Federated Data Governance enables compliance control over remote tables and data, reducing risk, and improving security for decentralized, multi-cloud data operations.
SOURCE: PRNewsWire