Building predictive models to estimate claim risk is not revolutionary. We all know that the techniques of using multivariate analysis and regression to predict a given outcome or behavior have been around since the end of World War II. Yet, as discussed in my previous article, the challenge has been to engage actuaries in the use of these techniques.

While actuaries are certainly very knowledgeable in the mechanics of these techniques, they have been limited by their inability to use them due to very challenging data environments. The advent of the data mining discipline now allows actuaries to leverage the use of these more advanced techniques and overcome data obstacles.

To further embrace the use of these techniques by the actuarial community, some data mining practitioners have built models that predict differentials as opposed to an actual score. The notion of differentials represents a concept that is familiar terrain for all actuaries.

In the actuarial world, actuaries are analyzing different groups based on certain criteria and determining how much different a given group is from the average. For example, a certain book of business may have an overall claim frequency of two percent. Meanwhile, within that book, males who are under 25 years old have a claim frequency of four percent, or a differential of 200 (4%/2% *100).

This scenario might represent just one of a 100 different scenarios that the actuary assesses when trying to determine the best differentials and the key variables that produce a given differential. But the critical difference from the data mining approach is that the actuary is always analyzing groups of records as opposed to individual records.

With the actuarial approach, solutions are based on identifying business rules or policyholder characteristics that produce the strongest differentials. At the end of this analysis, the solutions ultimately yield groups of policyholders that can then be prioritized based on the differentials.   

In the world of data mining and predictive analytics, practitioners typically deal in scores that represent the outcome of an actual predictive model. This differs from the traditional actuarial approach in that separate scores are produced for each policyholder as opposed to defining a differential for a policyholder based on the group that he or she is in.

Let's take a look at an example where we attempt to predict the probability of a policy having an auto claim with three variables, such as number of drivers on a policy, age of vehicle, and weight of vehicle being in the following equation.

Prob(Claim)= 1/(1-exp(-3.44

  -.2472*number of drivers on the policy

  -.1052*age of vehicle

  -.0003*weight of vehicle))

In this example above, the probability actually represents a score, which is the probability that the policyholder will have a claim. Yet, if we are trying to make these tools more actionable in the P&C world of actuaries, wouldn't it be more useful if we could somehow convert the output to differentials?

Through the use of general linear modeling techniques, each model variable can be converted to a differential by multiplying the coefficient of the model variable against the difference between the actual model variable value and the model variable mean.

Let's take a look at the example of age of vehicle where the differential for age of vehicle is calculated as follows:

DIFF=EXP(-.1052)*(age of vehicle -5.22) 
(Coefficient)            (Average Age of All Vehicles)

This kind of calculation would occur for each model variable. At the end of this process, one merely multiplies each model variable differential in order to get an overall claim frequency differential.

The advantage of creating differentials using this approach is that differentials are created at an individual policyholder level as opposed to a group level, such as being a male and  under 25 years of age. As discussed in the previous blog, this granular-type approach will always produce superior results than the traditional actuarial approach of creating differentials at a group level.

The other significant advantage in building models in this fashion occurs within the more regulated type environment. When rates have to be filed to a regulatory body, it can be extremely difficult and challenging for the regulators to accept models with scores as output.

Because these regulators are actuaries by profession, there will be some resistance to the use of predictive models as a way of filing rates, since the use of these techniques for rate filing purposes are outside their comfort zone. This scenario will even be more prevalent if model scores are going to be used as inputs into the filing structure.

However, by changing the model outputs from scores to differentials, the models yield outputs that actuaries conceptually understand, thereby making these models more palatable as a way of filing rates for pricing.

The business benefit of using predictive models over the more traditional approaches is becoming more clearly accepted throughout the industry. Organizations that refuse to adopt these techniques will become uncompetitive relative to other organizations that are taking advantage of these techniques.

The continued evolution of the data mining discipline will simply enhance the adoption and use of these techniques. Using these techniques to produce output such as differentials will further accelerate the use of predictive models as a business standard throughout the P&C industry.   

NOT FOR REPRINT

© Arc, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to TMSalesOperations@arc-network.com. For more information visit Asset & Logo Licensing.