Lakehouse design

6/24/2023 0 Comments

Lakehouse design

Typically this transformation uses an ELT (extract-load-transform) pipeline, where the data is ingested and transformed in place. With this approach, the raw data is ingested into the data lake and then transformed into a structured queryable format. Typical uses for a data lake include data exploration, data analytics, and machine learning.Ī data lake can also act as the data source for a data warehouse. Data lake processing involves one or more processing engines built with these goals in mind, and can operate on data stored in a data lake at scale. Data lake storage is designed for fault-tolerance, infinite scalability, and high-throughput ingestion of data with varying shapes and sizes. More flexible than a data warehouse, because it can store unstructured and semi-structured data.Ī complete data lake solution consists of both storage and processing.May be faster than traditional ETL tools.Users can explore the data and create their own queries.This is especially useful in a big data environment, when you may not know in advance what insights are available from the data. Data is never thrown away, because the data is stored in its raw format.The following are key data lake use cases: This approach differs from a traditional data warehouse, which transforms and processes the data at the time of ingestion.

The idea with a data lake is to store everything in its original, untransformed state. The data typically comes from multiple heterogeneous sources, and may be structured, semi-structured, or unstructured. Data lake stores are optimized for scaling to terabytes and petabytes of data. A data lake is a storage repository that holds a large amount of data in its native, raw format.

0 Comments

YOUR CART

Lakehouse design

Leave a Reply.

Author

Archives

Categories