Data Warehouse

A data warehouse is a type of data repository used by organizations to consolidate large data volumes before it is processed, analyzed or used by subsequent applications.

Before entering a data warehouse, data from heterogeneous sources must first be brought through a staging area that will integrate data into a more homogenous, consistent format. The data in the warehouse can then be accessed by applications for the purposes of reporting or allowing end-users to see only the data most directly related to their roles.

Data warehouses eliminate the “silo-ing” of data into multiple sources, where comparing relationships or even accessing data across silos can prove difficult. They were once necessary since many applications lacked the capability or processing power to analyze data or perform ETL (extract, transform, load) tasks.

However, modern applications and platforms can often readily share data with one another without the need for a separate “integration” or “staging” process. Additionally, data typically no longer needs to be stored within a separate data warehouse for it to be used by end applications or analyzed across sources.

Many large companies still choose to use data warehouses or a similar architecture in order to ensure that data is properly cleaned and integrated across sources so that it can be accessed readily. Ultra-powerful data warehouse systems can also greatly reduce latency when processing data.

These benefits can only be obtained with a significant investment in storage capability and processing power. Otherwise, the act of extracting data, duplicating it, processing it, storing it in an enormous repository and then accessing it can severely tax networked systems. Considering that a greater volume of data is generated in a few hours than what was once generated over a decade, use of a data warehousing architecture requires increasing investment in order to obtain the benefits of a consolidated and monolithic data repository.

Instead of a data warehouse, many companies now use applications that can directly integrate data across silos and make it readily available for analysis or use, cutting out the “middle man” of the data warehouse. These systems include Business Intelligence tools that are capable of combining data from a range of sources and analyzing them in real-time. Many applications are now also capable of performing tasks that once required a data warehouse, such as assigning metadata or compiling summary data.

In sum, data warehouses were a once-necessary method for storing, consolidating and analyzing data, but the costs and complexity of the architecture has become so great that only large enterprises with immense resources and highly specific use cases can truly benefit.