Data warehouses can provide a wide range of possibilities for analyzing information that has until now been collected in a number of different operating systems. Building one data warehouse, or several thematic data warehouses known as data marts, means that we can approach the data processed in the organization in an entirely new way.
It is now possible to compare and aggregate all the data concerning the same issue, which were previously collected in various specialist or departmental systems. It is also now possible to replenish data warehouses with one-off data streams from systems that are no longer used, which has been very difficult to achieve up to now. The data warehouse’s potential also arises directly from its periodical replenishment using ETL processes (Extraction, Transfer, Load). This makes the data uniform by assigning them identical analytical dimensions and storing the information in fact tables as additive measures. But the effective use of even a very well constructed data warehouse is not possible without a concrete understanding – within the business context – of the information collected in the system.
To achieve this it is essential to introduce a central metadata repository, which is a place that contains a substantive description of the data collected in the data warehouse. This repository will also have a special place to collect additional information about data, and include a range of information that will help in accessing the data you are interested in, for example:
The metadata context allows end users to effectively access the information they need and means that they can be sure of the quality of the data they are selecting for analysis. No matter the application they were created in, all the reports or analyses use the same metadata. What is more, since they have a single metadata repository and a single management application (MetaSource), it is a logical step to prepare data sets for reports or analyses in that application and send the finished data base query to the analytical tool. One extremely important question concerns the metadata’s freshness. For this, the system has a two-level procedure for managing metadata renewal. Information interchange standards based on XML, CWM, and using SQL, mean that the repository and application are easily integrated.