Data Lake vs. Data Warehouse

When is a data warehouse worthwhile and when would a data lake make more sense? We show what companies should look for when making their decision.

We have already explained in previous blog posts exactly what is hidden behind the terms data lake and data warehouse. Both are capable of storing large amounts of information and making it available for analysis. However, data lake (DL) and data warehouse (DWH) differ fundamentally in their concepts and the way they store data.

Deciding when to use one or the other depends on what you intend to do with the data. Below, we contrast DL and DWH to help you make your decision.

Gegenüberstellung von Data Lake und Data Warehouse

Data warehouses bring together data from different sources and convert them into formats and structures that enable direct analysis. Data warehouses can process large amounts of data from different sources. As a rule, key figures or transaction data are stored in the DWH. Unstructured data (e.g. images or audio data) cannot be stored and processed. Using a DWH is recommended when companies need analytics that draw on historical data from various sources across the enterprise.

Data Lakes ingest data from different sources in its original format and also store it in an unstructured manner. It doesn’t matter if the data is relevant to later analyses. The data lake does not need to know the type of analyses to be performed later in order to store the data. Searching, structuring or reformatting only takes place when the data is actually needed. Thus, a data lake is more flexible and can therefore be used well for changing or not yet clearly defined requirements.

Sources: Oracle & BigData-Insider

More exciting blog posts from the “Big Data” area: