Ensuring Data Integrity

In the claim-management arena, we have evolved from simple life forms only able to spit out closing ratios to super users who are able to extract reports from our claim systems on any subject senior management can conjure.

As anyone who has spent time on the computer knows, "garbage in, garbage out," is an accurate phrase connoting that even our most efficient supercomputer will not produce quality analysis if it is not supplied with quality data in the first place. An apocryphal tale involves two co-workers examining the output of their new computer. Said one to the other, "Do you realize it would take 400 people at least 200 years to make a mistake this big?"

We spend a great deal of time manipulating data from our systems, looking for loss and expense drivers, claim trends, and various other uses. A whole cottage industry is developing in the area of predictive analytics—the process of analyzing current and historical data to make predictions about future events. The insurance industry has been an early adopter of these methods, especially in the fields of fraud and credit scoring. The uses of predictive analytics are limited only by the imagination of those employing the techniques.

As claim organizations, we are investing significant sums to extract meaningful information from the mountains of data we collect every day. As Carly Fiorina, former chairperson of Hewlett-Packard noted, "The goal is to transform data into information, and information into insight". But the information obtained is meaningful only if it is accurate. In our collective experience, the quality of data gathering and recording in many organizations is below acceptable standards.

Risk Management Strategy

It is imperative that we have complete confidence in the accuracy of the captive data we rely upon to drive business decisions and analyze results. Otherwise, it is not only possible, but likely, that we will draw erroneous conclusions to the detriment of our organization's financial well-being. A proven risk management strategy includes conducting a comprehensive audit of the accuracy of your company's data. This monograph examines how to create an internal data integrity process.

Creating a project plan is the first step in your risk management study. What kinds of information or variables are you likely to analyze? In general, most companies analyze loss and expense drivers on the basis of discrete data points, such as age, sex, or jurisdiction. For internal purposes, you may also wish to include some qualitative metrics based on temporal considerations, which might include lag studies, timeliness of investigation, or policyholder contact. By thinking about the kinds of conclusions you hope to draw from your data, you can design a useful audit template. We suggest starting at a rather basic level, measuring only a couple of "gimme" data points. In other words, data points that should always be correct, such as date of loss or date of report.

The second step requires you to carefully craft precise operational definitions of each audit criterion. Everyone thinks they has a firm grasp by what is meant by "date of loss," but, as we all know, the date-of-loss concept can be nuanced by the line of business and by the custom and practice of a particular company's claim processes. We recommend holding a brief meeting not only with claim management, but also with data entry personnel to validate that the items audited have a common and consistent meaning.

How many data points should you measure? For those of us who have long forgotten our statistics classes, you may wish to engage some assistance on this point from your actuarial or finance teams, although many Web sites offer free, sample-size calculators with instructions. Essentially, the question of sample size comes down to how accurate the audit results need to be and how much time you have to complete the audit. Your objective is to pick a random sample that is representative of the entire data population, also referred to as the "confidence level." The sample size will also impact the margin of error, also known as the "confidence interval." As with most things in life, tradeoffs may occur.

Now you're ready to conduct the data audit. We prefer to test a handful of data points in our initial audit, and we always construct an audit template to record the results. A sample audit template is found in Figure 1. Note that the auditor records only whether the electronic data point is or is not validated by the source documentation. You could include additional columns to record the values found in the electronic file, the values found in the source documentation, and a "comments" field.

If more than one person will be conducting the audit, it is advisable to engage in pre-audit testing known as a Gauge R&R test procedure. This procedure provides assurance that the audit results can be both repeated and reproduced, no matter who is performing the audit or how many times the audit is performed.

Analyzing the Audit

Analyzing the audit results is made up of several steps. The most obvious step collates the audit results into a suitable reporting and analysis format. The most difficult step lies in interpreting what the results mean for your organization. This requires root-cause analysis to determine the reasons underlying the entry of incorrect data. A mistake may be due to simple operator input error, or it may be the result of unclear operational definitions.

Even if your initial selection of criteria passes the established quality standards, you're not done. First, you may want to audit additional data points that you expect to be used in data analytics to confirm that your initial audit findings are true across a wider range of data input. Second, even if you have selected enough data points in your initial audit, you will need to re-test your data accuracy on a regular basis.

For the sake of argument, let's say that your data input is less than optimal. Figures 2 and 3 represent audit results measuring two key claim components: Date of Loss and Date of Report.

Figure 2 indicates that data entry personnel miscoded the date of loss. This particular audit involved a client's workers' compensation line of business and the operational definition of date of loss was "Date of injury listed on the first report of injury filed with the applicable state agency (Division of Workers' Compensation)". The results were not only surprising, but indicative of the amount of data cleanup that was required before any meaningful data mining or data analytics could be performed.

Similarly, Figure 3 demonstrates the accuracy level of the data entry staff in recording the date the claim was reported to the insurance company. In this instance, our operational definition, which was taken from the company's claim procedure manual, required the input of the actual date stamp on correspondence (all incoming mail was stamped the date received), the date the fax was received (as indicated by the time/date stamp), or the date the e-mail was sent (presumed to have arrived the same day).

It didn't take a rocket scientist to determine, "Houston, we have a problem." But the thorny question that remained outstanding was, What could be causing such a result? As it turned out, the problem was not due to careless data entry, but caused by a misunderstanding of the operational definition. The data entry operators were using the date the claim information was entered into the system as the "date of report," even when that was a day or two after the initial receipt of the report.

Both of these audit findings required the data-integrity issue be addressed immediately. After consulting with the company's senior management, we convened a meeting of the data entry personnel. The first agenda item reviewed the operational definitions for each key data point to be used in ongoing claim analysis. Where appropriate, the operational definitions were clarified and changed, and examples recorded for future training and review sessions. These operational definitions were reduced to writing and given to each data entry operator.

The next step was the key to success. Data entry operators were informed that performance would no longer be measured solely on speed or number of claim transactions processed, but instead on the accuracy of the data input into the system. Their annual performance goals and objectives would include periodic measurements of data accuracy.

The audit process was described in detail to the data entry personnel, including the number of files to be reviewed on a regular basis, audit criteria, and operational definitions. The purpose of the audit was two-fold: Ensure data accuracy and integrity, and provide additional "objective" criteria to measure job performance. A side benefit for management is that it also provides valuable information for ongoing training and education.

Final Warning

As organizations move toward spending additional resources on data mining and data analytics, it is imperative that the analysis fairly and accurately reflect the company's results. We are big believers in the old data analysis adage, "Torture the data until it confesses." However, unless you have taken solid risk management steps to ensure the integrity of your data, you may convict your colleagues of crimes they didn't commit.

NOT FOR REPRINT

Ensuring Data Integrity

Recommended Stories

Technology Is Transforming Claims Management

What Technology Can (and Can't) Do for the Restoration Process

U.S. News names inaugural auto insurance award winners