The opportunities (and risks) created by emerging data sources

Risk models are only as effective as the data being fed into them. (iStock)

Since ship owners and underwriters began gathering at Lloyd's Coffee House to insure voyages across the British Empire, insurance has been about proper risk selection. To aid in determining which risks to take and which to pass on, insurers long ago began developing statistical models. As an early example, in 1654, the mathematicians Blaise Pascal and Pierre de Fermat devised probabilities that corresponded with various levels of risk. Pascal's triangle led to the first actuarial tables, which are still used today when calculating insurance rates.

Good data matters

However, since models came into use, their effectiveness has only been as good as the data being fed into them. Without good data, insurers are often operating on unverified assumptions and guesswork, so more accurate and granular data collection has become more important over time. In 1694, for example, Sir Edmund Halley created the first known mortality tables collected from a small town in Germany. (This data was so sensitive that Halley published his table through Pierre Marteau, a pseudonymous publishing house that served to avoid censorship and insulate publishers from political blowback.)

Such first-of-its-kind data would later help in the development of better life insurance models. Similar data was accrued over time for sea voyages as underwriters collected information for how ship type, seasons and routes affected the success of a voyage.

Modernization beckons

Broadly speaking, it's becoming apparent that we've squeezed a lot of performance out of our models, and that the biggest gains going forward are from better data.

Every so often, we see the adoption of new data that allows insurers to generate better models and select better risks. Over time, this new data becomes standard for selecting and pricing risk ― but not before providing early adopters with a large competitive advantage while pushing other insurers to the brink of insolvency.

As underwriting strategies became more sophisticated and competition intensified over the centuries, the potential for better risk selection using new data has increased. At the same time, carriers have become more susceptible than ever to selecting poor risks (adverse selection).

Boots on the ground

In the past few decades, there have been examples of emerging data sources that drive better risk selection and have proven to be tremendously valuable to carriers, especially early adopters. In 1991, for example, Progressive was the first auto insurance group to begin piloting the use of consumer credit history and credit score as a rating variable to develop more accurate policy premiums. By 1996, they had rolled it out nationwide. By 2006, almost every insurer was using credit score to set prices.

Following Hurricane Andrew in 1992, primary carriers began using fluid dynamics and climatological data to create the first catastrophe models, which allowed for rate-making and enabled the industry to shift from assessing risk at the portfolio-level based on past claims, to a more granular level that accounted for the difference in geographic and structural components of a property.

Now, we are in an unprecedented, new inflection point driven by three simultaneous technology innovations:

The ubiquity of big, digital datasets that can be accessed in real-time, as needed;
New forms of raw data capture via technologies like sensors and aerial imagery; and
The development of machine learning algorithms to sort and make sense of data automatically, on a massive scale.

These leaps forward have exposed numerous new data sources directly tied to insured losses. While emerging tech-first carriers like Lemonade or Metromile may have an advantage ingesting and analyzing massive quantities of unconventional data, there are a few new critical data inputs that are already seeing broad adoption by forward-looking incumbent insurance carriers:

Imagery-derived intelligence

Today, imagery-derived data is fast becoming the next critical source for assessing risk information. Imagery has reached broad acceptance as a claims tool with major vendors like Verisk Analytics, Eagleview, and CoreLogic participating in its use. However, until recently, it's adoption in underwriting has been held back by the lack of AI tools, like computer vision, to analyze imagery automatically and, most importantly, at massive scale to create a homogeneous dataset that can be used in assessing and predicting risk.

Traditionally, assessing the exterior condition of the roof was only available by ordering individual property inspections. Using computer vision and geospatial imagery, my company has developed Roof Condition Rating, a property attribute that assesses the condition of the exterior roof. Roof Condition Rating enables insurers to easily identify roofs that show signs of degradation.

Roof condition offers strong predictiveness of loss, offering advantages compared to previous signals such as roof age or tax records, which served as roundabout proxies for estimating roof condition. It is also superior to more costly in-person inspections, which cannot be provided in advance, at the time of quote. Unlike roof age or inspection data, roof condition is a direct measurement of a feature that matters deeply to underwriters and can be collected and processed in an algorithmically uniform way, nationwide, with better-than-human accuracy. Going forward, similar forms of data that rely on AI and imagery will increasingly become the gold standard across the industry.

Moreover, this kind of granular data has the potential to change how insurers do business, moving them from a reactive position to enabling proactive positive selection and much greater ROI for marketing activities.

Usage-based insurance

Just five years ago, insurance policies were provided on an annual basis. There just wasn't enough information about consumers or assets to create policies on a shorter time frame. Now, the integration of inexpensive but powerful sensors, machine learning, and ubiquitous cell connectivity have supercharged the flow of information. Together, these technologies are generating and transmitting massive amounts of critical data, which can then be used to calculate risk far more often, and cover an asset or individual only when necessary.

For example, prior to telematics, insurers had no idea how an asset was being used. Now, this can be done instantly ― allowing for the creation of real-time insurance policies that cover people depending on the situation, for a limited amount of time. Companies like Metromile are doing this for millennials who drive infrequently and prefer paying by the mile. Other companies, like Slice Labs, are creating policies for ridesharing, so the driver is covered when there are passengers in the car, and for home sharing, for those two days per week that a home is rented out to vacationers.

In the meantime, signals within the data, like brake pressure and acceleration in a connected car, can tell insurers a lot about driving safety and the likelihood of an accident. Expect the possibilities in this realm to expand as 5G rolls out and wireless connectivity becomes fast enough to deliver even larger amounts of granular data in real time.

Risk-specific sensors

Lastly, we are now seeing the roll out of sensors that are paired with AI to monitor risk-specific conditions with far greater accuracy and generate highly actionable data. For example, Pillar Technologies is deploying cutting-edge sensors to construction sites, which can warn property owners, insurers and contractors of fire or freeze. Home sensors like Notion are extremely well suited to monitor for water leaks and will advise homeowners to call a plumber if a leak is detected. Finally, Nest and Amazon are creating security cameras that use computer vision to detect the difference between a homeowner's face and an unknown person at the front door, providing a real-time signal for security risks, before an event occurs.

These IoT devices are now becoming smart enough to predict ― or at least warn ― insurers and consumers about an upcoming risk, with enough time to take action. As devices improve their ability to pick up on signals or risk, insurers will start to offer huge incentives to make sure their customers are using the technology.

All three of the above examples ― imagery-derived intelligence, usage-based insurance, and risk-specific sensors ― clearly show that new forms of data are changing how insurers select the risks they decide to cover. The insurers who embrace new data will experience a first mover advantage and separate themselves from the pack, while another group of slow-moving insurers will be at far greater risk of adverse selection. If these insurers are adversely selected against, they will likely rack up greater losses by being forced into taking on poorer risks, and will either be priced out or outmaneuvered by insurers with a superior product.

History shows there is a reason to move quickly in adopting loss-correlated data. Otherwise, we may once again see another crop of insurers disappear.

Ryan Kottenstette ([email protected]) is the CEO of Cape Analytics, a California company that uses AI and geospatial imagery to provide insurers with instant property intelligence. These opinions are the author's own.

NOT FOR REPRINT