Gavroshe USA, Inc.

Profile of Total Quality Data Management Practice

Improving Content, Definition and Presentation of Data and Information

Overview

Data quality problems currently cost U.S. businesses in excess of US$600 billion per year, according to interviews with industry experts, customers, and survey data.  Potentially the data residing in legacy systems and data warehouse may be of poor quality.  Any interface may potentially introduce data that is not clean. In addition, over time data may become ‘dirty’ due to faulty processes or rules/standards not being adhered to.  

Quality data does not necessarily mean perfect data. It is essential to set quality expectations, as often deliberate tradeoffs are made between speed, convenience and accuracy. For example, data may be updated weekly, and so results of a query may be some days out-of-date. 

For any data migration to be timely and successful it is imperative to know that the disparate data sources to be migrated, have been ‘cleaned’ and re-engineered into the format or ‘model’ expected by the target system.

Without the ‘knowledge’ that the data to be migrated is right, the migration process must take an iterative, ‘amend & load’ approach, leading to increased project time and cost. (How many loops this iterative approach will go through is a complete guess and every project underestimates the amount of time required).

By introducing a phase into the project to analyze and ‘correct’ the source data anomalies, the actual migration and implementation phases become quantifiable and accurately forecast phases of the Project Plan.  Therefore the approach to data cleansing and migration must include the following five major components.

The initial phase of the Cleanse and Migration approach is the Analysis of the source data, working with both the IT and Business communities to address both data accuracy and relevance. Data Analysis also known as Data Profiling and Mapping comes before ANY Migration Project data staging or mapping exercises. It ensures that the project starts from a known point, with a ‘clean’ data platform. This process has the essential effect of reducing project time scales and costs by removing the ‘Trial & Error’ iterative approach.   In addition, it enables the organization to identify ‘weak links’ in the Data Supply Chain and provides the opportunity to implement new processes.
 

Our Approach to Data Quality Management

Answering Questions Around Quality

Businesses must attempt to answer the following questions to help work out a tactical solution for data quality improvement:

It is advisable to evaluate the costs and benefits of a data clean up effort. All activities involved in data quality should be considered. Cost should also include the cost of ‘poor’ data quality. A go/no go decision to commit resources is made at this stage. The only way that problems associated with data quality can be effectively recognized and addressed is if Companies monitor, analyze and report on these problems on an ongoing basis.

Gavroshe assists the Organization to specify its strategy for Data Quality Improvement. This is an essential step to ensure that a long-term approach is put in place that will enable the company to sustain any migration be it a BI initiative or a new application.

Data Profiling and Mapping

Profiling and Mapping uses a technology, (not people) intensive approach coupled with in depth Business experience.  The complexity and volumes of data within the source systems means that manual analysis, cleansing and transformation would require multiple teams of many analysts. Not withstanding the cost implications, without the use of specialist tools, the ‘quality’ of the analyzed and cleansed data would be significantly inferior, therefore the approach we are adopting dictates that specialist analysis & updating tools are required.

Gavroshe assists The Organization to identify the Source data to be analyzed, helps define the business rules and measures against which the data is being compared to determine compliance. A set of cleansing requirements are produced as a result of this comparison and reviewed with the Business Experts. At this point the cleansing activities can be prioritized and planned.

Data Cleansing

The data cleansing process runs in parallel with the data analysis task. As data quality issues are uncovered, the Analysis and Cleansing teams, in conjunction with Business users have to identify:

The data cleansing approach will not be separate from the overall data migration method but an integral part of it. The data migration approach will identify the transformation and validation rules to be used at migration.

There are four basic methods for “cleaning” data:

De-coupling is required when a single record contains data describing two or more entities, such as individuals, products, or companies. To correct data in a relational database, analysts use SQL or a commercial data quality tool with built-in SQL support. To correct defects in non-SQL databases, the native data manipulation language is required. To correct data during extraction or loading processes an ETL tool is used which offers a visual interface for defining transformations.

Forensic Analysis

For those enterprises that do not subscribe to an architected framework to manage their problem and solution spaces we offer as part of our Data Quality services a rigorously disciplined alternative approach to analysis, design and development referred to as the Forensic Methodology.  The Forensic Methodology is the application of FORENSIC DISCOVERY to existing operational or production system components to derive an accurate working ARTIFACT CATALOG.  The methodology is characterized by the rigorous approach taken when recording rational investigative information resulting from reconnaissance activity on select production systems.  The methodology takes its roots from Forensic Science which has and continues to serve many professional and academic disciplines.

Business Outcome Delivered

Gavroshe assists The Organization to define the following: