One of the greatest challenges to any company is the inability to aggregate data and content in such a way that it becomes truly meaningful. Let's face it, business and data analysts face an increasing challenge when it comes to dealing with data and content sprawl across the enterprise. This leads to glaring inefficiencies in reporting, operations, underwriting, claims management, and the list goes on and on.
While many companies have sought solace by purchasing large scale data marts, warehouses and top of the line reporting tools, many companies lack the expertise or personnel who truly know how to use them. That, coupled with an increasing number of data types, repositories, content types, and in some cases, terabytes and petabytes of unsegregated data, has caused companies to flounder when it comes to efficiency in reporting and identifying meaningful trends in their business.
What is big data, and does your company suffer from it? Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.
Traditionally, data mining has been one of the most used methodologies for providing detailed, analytically significant data around marketing trends, claims, underwriting and liability management. Big data, when managed properly, can provide even more valuable insight into the markets and potentially aid with retention as well as deciding the next great place to expand existing or new lines of business. It also assists in product management and development.
Modern data mining models such as decision trees and neural networks can more accurately predict risk than current actuarial models; therefore, insurance companies can set rates more accurately, which in turn can result in more accurate pricing and hence a better competitive position.
But data isn't just about databases and spreadsheets anymore. Data has evolved toward metadata that may live outside of a data structure, non-relational or unstructured data, and social metrics that now present actionable data into what the market and your potential customer base is doing.
Insurance companies require the ability to harness big data as it will provide a huge benefit if it can be used effectively in conjunction with new data sources, sophisticated analytics, and ultimately an accurate methodology for pricing risk and loss prevention measures.
Risk has historically been based on probability, however there is a need for insurance companies to control growth and understand risk as it applies to the business, while maintaining accurate measures and dimensions, thereby eliminating the broad probability that geographical and mortality data will not directly correlate to your premiums. This is a huge factor for insurance companies as better metrics means premiums can be closely matched to a person’s or a business’s risk profile.
Automotive data from onboard computers can now be just as valuable to the property and casualty insurer as it is to the life and annuities insurer.
A few points of discussion around the methodologies behind big data and how to get the most out of it are relevant to the technically inclined:
Most big data will not always be relational; in fact, in most cases it won't be. So for those who can only grasp the concept of one data element having a direct relationship with something, for instance a policy to a claim or a claim to a reserve, this will be quite an eye opener.
In dealing with big data, there are three fundamental approaches:
- Massively parallel processing AKA MPP databases
- "Not Only SQL” better known as NoSQL frameworks
- Columnar databases. MPP and NoSQL both use cluster computing, in which a set of connected computers called nodes work together as a single system. In cluster computing, the data is divided up and stored on different computers; data processing and analysis operations are run locally, or distributed, on each machine. Columnar databases are most effective with only special types of data such as data fields that have distinct values.
MPP databases are relational, however; they are specially designed to "span" across the clusters, virtual or physical.
A NoSQL database takes a completely different approach to handling data. The largest NoSQL solution in the world today is Hadoop. Hadoop uses flexible data structure and can scale on low cost, lower performance hardware. Hadoop uses a component that is a programming model called MapReduce, which spans the processing across the nodes of the cluster. A language which looks eerily like SQL, called Hive, builds MapReduce programs in the background. But Hadoop is not for the faint of heart; you still need some database skills to use it, so unless you are a BA or DBA, this solution still requires a knowledgeable person(s) to administrate and deploy it.
The biggest piece of the puzzle not yet discussed here is algorithms. You have all the data and a database or some kind of file store that holds it, but you still need a methodology for taking and creating measurable metrics and dimensions from it.
There are tool sets that can be purchased that allow you to create algorithms around data, but most algorithms rely on some kind of filtering of the data elements, a potential algorithm to make it measurable, and a presentation layer to present it in a way that makes sense to a high level, non-technical audience.
Currently only the largest insurers have traditional policy rating and claims data that would be considered “big”, i.e. potentially unmanageable with a relational database. But insurance companies of all sizes are encountering big data from new sources, such as their website traffic, vehicle telematics programs and social media.
Most companies have IT infrastructures that were not designed for the volume of data generated by their web traffic or telematics programs. This big data is very expensive, if not impossible, to manage using existing relational databases. Companies often give up and discard it, keep only summaries or very short histories, or outsource the storage to a vendor that provides only limited reporting and analysis capabilities. These choices leave a data scientist without the means to access, leverage and integrate the big data to find new insights and value.
For example, while social media monitoring is often outsourced, the next opportunities for insurers to leverage social media data are in areas such as distribution, underwriting and claim fraud detection, all of which requires integration with internal data sources. Telematics data can help identify marketing opportunities, streamline accident reconstruction or recover auto thefts. In order to benefit from big data, insurers must gather, store and access it to discover these insights and make better decisions.
A major insight from both sides of the desk, whether in a small, midsized or large property and casualty insurance company, companies that have operational databases without a data warehouse to facilitate access for analysts and modelers have a tremendous asset they cannot leverage.A data warehouse without good tools for business analysts to access it is a significant lost opportunity.
Big data is here and companies need to figure out how to embrace it, as it will certainly provide a great deal of value as a tool in the right hands.