Data Lakehouse is an emerging solution to address the weaknesses of Data Warehouses and Data Lakes but maintaining the strengths of both. proSkale recommends four aspects to consider when developing your plan to deploy a Data Lakehouse.
Deploy a Data Lakehouse
These apply to any technology architecture.
1. Have a landing and raw data zone if you do not have one. A Data Lake is outside the core IT systems and this zone will ease and accelerate data ingestion.
2. Give the Data Scientists access to the raw data and space to conduct experiments. Data Scientists will be using raw data to test out theories and to find new insights. When insights are found, the associated data may need to be moved to the Data Lakehouse to allow the models to run more efficiently and be embedded within day-to-day operations.
3. Selected data from the Data Lake is extracted, curated (transformed) and used to populate the Data Lakehouse. This selected data is the source for reporting, dashboards and is the official corporate record from which decisions are
4. Data Lakehouses are open architecture with decentralized distributed domain teams that are responsible for the data in their respective business domains. With decentralization comes an increased risk of exceeding your Cloud budget. Be sure to have visibility and forecasting in place on the Data Lakehouse to manage costs.