Gavroshe USA, Inc.

Profile of Metadata Management Practice

Integrating Business and Technical Definitions and Rules

Overview 

Metadata is an integral part of the Data Warehouse environment.   The demand for data warehousing and high-level decision support has increased dramatically.   Organizations are being challenged to roll out ever-increasing functionality in their data warehouses to an expanding audience of end users. 

Understanding the data in the warehouse and where it comes from once seemed a manageable job; a small handful of subject area experts could be relied on to maintain the knowledge of the data in the warehouse and its related business rules.   However, as the scope of data warehouses grows dramatically, that small handful of experts has grown into a crowd.   This crowd of business knowledge owners requires a better, more standardized way to document and communicate their knowledge of the warehouse, its rules and its data sources.  

There are 5 major functional components that an Organization has to address as part of a successful Metadata initiative.

 

 Metadata Infrastructure

It is essential to create a carefully managed (i.e. project based) environment to derive the expected and on-time benefits from the metadata initiative. Several key infrastructure deliverables have to be created during the metadata project.  These deliverables include inter-alia:

Repository Implementation

Prior to storing metadata, a Metadata Repository must be in place.   This can either be an “off the shelf” repository product or a custom-built repository tailored specifically to the organization’s needs.   Either repository solutions could necessitate the building of scanners needed to automate the loading of metadata from selected technology sources.

Whether the choice is to buy or to build, the previously mentioned infrastructural deliverables must be defined.   This will ensure that the selected repository solution meets the organization’s long-term metadata management requirements.

There are three major stages (also many sub stages) in the life cycle of successful metadata management.   These three stages come together synergistically to make metadata useful in the data warehouse environment:

Metadata Collection

Collecting the right metadata at the right time is the basis for a successful metadata driven data warehouse implementation.   The metadata that can be useful for a data warehouse spans a wide variety of domains, from physical structure data to logical model data, to business usage and rules.   Each of these types requires its own strategy to collect.   Some can be automated to a great degree, while others will involve more manual effort.  

Warehouse Data Sources:  One of the most fundamental types of metadata that can be collected is information about the potential sources of data for a data warehouse.   Even though these data sources may be on different platforms and could be stored in very different formats, we have a consistent need to understand both the physical structure of the data and the meaning of the data. It is important to ensure that the entity and element definitions, business rules, valid value, and usage guidelines are transposed properly.

Data Models:   As the warehouse is designed, its data models evolve.   It is critical to be able to trace the connections from the enterprise data model to the warehouse model so that we can understand how specific logical entities are implemented in the warehouse.   Once you have collected metadata about your enterprise model and about the warehouse model you must correlate them.   In most cases, this mapping will be a manual task, since there is no automated way to discern reliably which warehouse data structures were derived from which enterprise model entities.   It is important to ensure that the entity and element definitions, business rules, valid value, and usage guidelines are transposed properly.

Warehouse Mappings:  As the analysis on the warehouse progresses and elements from operational systems, external data, or other sources are mapped to the warehouse structures, this mapping should be collected and maintained as metadata.  In cases where this mapping is done manually and code will be written to affect transformations of the data, the design specifications for the mappings will act as the source for this metadata.

 Warehouse Usage information:  This is metadata is about the usage of the information in the warehouse.   This information can be captured only after the warehouse has been rolled out to users however collecting this information is not easy.   It is important to understand who is using the warehouse and how they are using it, to facilitate better performance tuning of the warehouse, greater re-use of existing queries and how information in the warehouse is being used to solve business problems.  

Maintaining Metadata

Once all of the metadata has been captured initially, it must be maintained.   Metadata must be kept up to date with reality.   Like metadata collection, automation is the key to maintaining current, high-quality information.   The best approach for maintaining each specific type of metadata depends on how it was originally collected, how often it changes, and the volume of metadata generated.

Metadata Deployment

This is where the pains taken to collect and maintain good-quality metadata comes to fruition, for  metadata is only valuable to an organization when it is deployed and used. In the data warehouse environment, metadata can be deployed in different fashions to different groups of recipients, including data warehouse developers, warehouse maintainers, and end users.

Gavroshe will assist the Organization to address the 5 functional areas for Metadata Implementation and Management: 

Gavroshe will provide a skills transfer program to selected resources of the Organization. From this base, the Organization can continue Metadata Management on its own.