With the rise of integration between traditional data warehouses and data lakes happening with many customers, the concept of Lake House is starting to become more popular. A lake house is where you combine the traditional data warehouse approaches, analyze structured/semi-structured data, and combine it with the unstructured/semi-structured data stored in the data lake. For example, unstructured data could be things like images, texts, web pages, all of which need to come together and understood from a holistic point of view.
There are five key elements to the Lake House:
Data Warehouse: A type of data management system that is designed to enable and support business intelligence (BI) activities
Data Lake: For data that is raw, less well-understood, relatively old, or less valuable. Often used for staging prior to loading to data warehouse, archive of historical data, and a comprehensive repository for training machine learning models. It enables an enterprise to store all of its data in a cost effective, elastic environment while providing the necessary processing, persistence, and analytic services to discover new business insights.
Managed Open-source Services: Support key open-source tools for storage and analysis, incl. Apache Hadoop, Apache Spark and Redis
Data Integration: Data in lake house will move between lake, warehouse and open-source analytics environments—depending on need and use case
Data Catalog: Maintains a complete view of all available data for discovery and governance
The data lake house combines the abilities of a data lake and a data warehouse to provide a modern data lake house platform that processes streamed data and other types of data from a broad range of enterprise data resources.
Enterprise Data Warehouse Overview
In the following slide, we see data sources on the left coming into the warehouse. Data from sources such as operation database, external datasets, enterprise apps or SaaS apps and data from events or sensors such as IoT sources or it could even be data streams from social media flowing into the data warehouse.
In most scenarios, customers are trying to gather data from many sources, and act on them in a variety of ways.
In the data engineer phase, we have solutions for data integration that allow customers to batch, stream, or transformation data before it flows into the warehouse.
In the Data management area, this is where the data is stored in places such as a Departmental Data Warehouses, or a Data Lake.
In the analytics area this is where customers use solutions such as Oracle Machine Learning, OCI Data Science, and Oracle Analytics Cloud to develop the visual models and harness the insights from the data and present it to business leaders to act on.
When you look at your overall data warehouse migration journey, you want to be able to understand what your strategy is and how to move forward to reach your business objectives.
While migrating to the cloud, there are a few common scenarios happening. Customers may want to migrate the same data warehouse (and, optionally any OLTP workloads) to the cloud without disrupting into a like-for-like situation. For example, they want to keep the same setup and workload that they have on-premises and move ‘as it is’ to the cloud.
The above figure shows you where the Oracle Lake House architecture fits into the broader picture. On the left side, you still have data coming in from the various sources, operational databases, enterprise applications etc.
All the data ingestion, transformation and integration flow into the lake house. And, at the very core of the lake house is a unified, efficient, low cost and scalable place to unify all enterprise data even though the data may sit unstructured all the way to curated and structured data.
On the right side is where the data analytics and insights are consumed by business leaders, or fed the insights into other enterprise applications.
The Lake House is what is helping to drive innovation with real-time data access and machine learning generated insights based on a unified view of all the data.
Why Choose Oracle Lake House
Comprehensive + best in class: Customers shopping for a cloud data platform now have a broad and comprehensive set of capabilities alongside best in class DB and DW products
Outcome Focused: Fast time to solution building on Oracle’s years of industry experience, connections to application data and workflows
Differentiated LOB insight: Co-locate analytics with Oracle SaaS application data leads to faster time to insights
Besides, Oracle Lake House is easy for customers to use and manage because it interoperates with data integration, data science, and analytics services, while enabling database or data warehouse developers to easily extend workloads to the Lake House and beyond.
Data engineers can leverage open sources tools and services they are familiar with and incorporate them into the Lake House data workflow.
Data scientists can easily query, visualize, and transform the data from the Lake House to develop machine learning models for better insights and outcomes.
Enterprises benefit by eliminating data silos and ensuring that data lakes are not isolated from other corporate data sources.