The issues associated with performing QA and data validation in an enterprise-wide business intelligence initiative are complex, and in many ways transcend the QA tenets that are appropriate for online transaction processing systems (OLTP). Attempts at establishing effective data validation often fail due to the enormity of the task.
The growth of business intelligence in insurance is driven by a number of key factors; one of the most important being the compelling need for an insurer to have a "single version of the truth." In order to achieve this lofty goal, an overall BI strategy must be defined, organized, developed and tested. It is the testing component or more specifically, the validation of the data that is the subject of this paper. We use the term "QA" throughout the paper. For our purposes, QA refers to the process of testing and validating the data that is used to populate the BI system that is being implemented. It does not refer to functional validation that is more appropriate to an OLTP environment.
This is a Business Initiative - Not Just an IT EffortIT does not create the data, nor use it in a business context. IT is asked to store the data and to provide the mechanisms to get it safely into the computer systems. IT is also required to provide the necessary reporting assistance for using the company's data. Because only the business actually creates, uses and understands the data, it is logical that they must have a significant ownership in the process of validating the data that will fuel the company.
Our joint findings from the project confirm this notion along with;
? Create Data Stewardship - This should be implemented at an early stage in the project or better, before the project begins. Typically a data steward is not an individual, but a shared responsibility and "ownership" of data integrity, between key functions in the company. This typically should involve Finance, Underwriting, Claims, Actuarial as well as Marketing and Premium Audit. The validation of data is an important component of such a program. Most often, the business process and rules that drive the collection of the data are the chief culprits for poor data. The ownership of the business functions facilitates the ability to recognize root cause of the data quality issues and implement remediated processes.
In order to maximize the chance for a successful data validation effort, it is critical that the appropriate expectations are set regarding the process and the metrics defining success or failure.
? Is Perfection Achievable? - Let's start with the simple fact that there is no such thing as "perfect" data. Because a human being will be recording the data at the outset of the process, there will be expected human errors. Additionally, there are many fields in most processing systems that are unedited, or poorly edited and therefore prone to error. To prevent issues during data validation, critical steps at the outset are:
Once the business side is engaged and the level setting process complete, it's time to start the data validation process. There are several key best practices in the process that should be understood and employed to maximize the possibility for success:
? How Much Data? - We have found that it makes sense to "smoke test" the earliest phase of QA by creating a sampling of approximately 1,000 policies and their associated claims. The policies selected should represent all lines of business, product types and associated coverages. The QA team should interface with the business functions to determine if there are unique conditions that should be identified and brought into this initial test bed. This "smoke test" should test the transformations and should be used to validate against the source systems on a policy by policy and claim by claim basis. As the processing source systems are a moving target, a snapshot, or a "frozen data state" of each source system should be taken and exposed to QA staff so that there is a "apples to apples" comparison between the processing system data and the BI data.
Of course, the use of the BI solution as the validation tool pre-supposes that the semantic layer in the tool has been completed to the extent required to support the effort. As a best practice, it becomes important to pre-build the semantic layer before the data validation effort commences.
? Metadata Use - In a best practices BI environment, the use of a complete source to target metadata management layer becomes a major asset in the data validation stage. A key success component in our mutual success with the Montana State Fund was the development and implementation of a metadata solution that provided complete data lineage detail including transformation activities from the individual source through the BI, or reporting layer. This provided the data validation team with an ability to instantly assess the path the data travelled to its ultimate reporting destination. It also provided an enforcement to the business rules and definitions finalized at the beginning of the process.
We hope that we successfully made the point that data validation in an enterprise-wide BI initiative is a challenging task. The sheer weight of the validation of potentially trillions of data combinations, demands that an appropriate process be planned and put in place to address this critical need.
To maximize your success potential: