Virtually every industry is vulnerable to fraud. Each year, fraudulent activities account for billions of dollars lost in the insurance, banking, health care, retail, transportation, manufacturing, and communications industries. Likewise, fraudulent activity riddles our federal and local governments.
The U.S. General Accounting Office estimates that $1 out of every $7 spent on Medicare is forfeited to fraud and abuse. Depending on the reference, Medicare loses up to $20 billion to fraudulent or unnecessary claims each year. The insurance industry estimates that about 25 percent of each premium dollar is spent on covering fraudulent or inflated claims. This puts the yearly costs at an estimated $30 billion nationally.
A USA Today article stated that the identity theft epidemic has affected 27 million people over the last 5 years, with 10 million in 2002 alone. It is estimated that almost $50 billion has been lost to identify theft. To put these numbers into perspective, consider that only 69 of the 182 countries recognized in 2002 had a Gross Domestic Product (GDP) over $20 billion. In other words, the losses from fraudulent activity in the U.S. insurance market alone exceed the GDP for more than half of the world's countries.
These numbers are staggering, especially considering that they are largely paid for by the consumer. More effective methods must be deployed to minimize these losses. Industry experts estimate that for each dollar spent on combating fraud, between $5 and $15 is saved, depending on the industry being served. This return on investment (ROI) is cumulative, as it minimizes future losses for the same fraudulent activities.
Discover, Then Classify Patterns
Flexibility remains a critical aspect for quickly responding to changing fraud patterns. It is crucial to dynamically expose new patterns of fraud without having to re-program, re-train, or re-invent the underlying systems. Remember that before you can classify patterns, you have to discover them. Discovering insurance fraud is not really any different than exposing money launderers, terrorists, smugglers, embezzlers, or other types of entities involved in illusive behaviors.
The data associated with workers' compensation, property and casualty (P&C), personal injury, and other types of insurance-related matters can be viewed in its most basic form: as interrelated objects. Generally there will be a subject — the policyholder, claimant, injured party, lawyer, doctor, and so on — addresses, phone numbers, accounts (policies), and, of course, the actual claims. How the objects are related is based on the nature of the claims submitted, and behaviors can be exposed through repeated claim submissions. It is this repeated behavior connecting the different objects that provides the patterns of interest.
There have been a host of new technologies introduced into the insurance fraud marketplace over the past several years, including link analysis and other systems for detecting those relationships and associations that are not immediately obvious. Arguably more important are the analytical methodologies that have been refined to help interpret the complex networks and patterns presented by these technologies. An enhanced understanding of the data will inevitably lead to better pattern detection and, ultimately, to a lower incidence of fraud. Once a pattern has been exposed, it is up to the insurer to act on that knowledge by changing business processes to flag related or similar occurrences of the pattern. Remember that there are always exceptions to the rule.
In this example, the network structure represents data from a customer profile database and depicts a single social security number (SSN) connected to six policyholders ( claimants). Initially, most SIU investigators would consider this a very questionable and suspicious situation, especially if the SSN appeared valid and the name of each claimant was different.
This network might not be considered suspicious if the SSN represented an invalid or common number, such as "999999999," "000000000," "unknown," or "not provided." Often, this can be contributed to faulty data collection, improper collection interfaces, or flawed data entry systems. If there is a lot of dirty data, then the entire network would be disregarded because there is no reliable connection among the claimants. A different scenario would arise if each of the claimants had a similar name. In this case, the investigators might discount the severity of the pattern if the names represented, let's say, "John Smith," "Johnny Smith," "J. Smith," "Jon Smithe," "Juan Smith," or "J.J. Smith."
Obviously, all of these names could reflect the same claimant. This would then raise the question: Is this person trying to avoid detection by varying his name? It would be highly unusual for each claim to have a different spelling, and it would certainly be suspicious if the network was formed from claim data provided by multiple insurance carriers. Now what is not explicitly conveyed in the diagram is that each link between a claimant and a SSN is generated based on the occurrence of a separate and unique claim. Thus, there are at least six claims involved in creating this network (if not more); and that fact alone would form the basis for starting an investigation.
A completely different interpretation can be made using a variation on this particular pattern. In this case, the center of the network represents a claimant that is related to six SSNs. This pattern is of most interest when the claimant has a unique name and the SSN values are also unique. Often, this type of pattern represents an intentional misrepresentation and can be easily acted upon. The exception comes when a common name such as "John Smith" is used as the claimant because the SSNs are most likely valid and the network is formed based on too general a representation. Luckily, there are techniques and solutions for dealing with these types of situations through better data representation and disambiguation functions.
In reality, the data will most likely show a combination of patterns where different numbers of claimants and SSNs are interrelated to form a more intricate and questionable network structure. The patterns grow more involved when the phones, addresses, and other pertinent claim data are represented within the network. These patterns are not unique to the insurance fraud world. They also appear when exposing money laundering networks, terrorist cells, and organized crime rings. Ultimately, the SIU investigator will need to know how to interpret these structures and act accordingly.
Integrating Multiple Data Sources
The problem becomes even more complex when analysis requires the integration of multiple sources of data obtained from different insurance carriers. The insurance industry has long recognized that providing a reliable and effective means of integrating multiple sources of data is important in exposing larger fraud rings. The capability to combine data from different companies and sources is essential for fraud detection. As these trends continue, it will become increasingly critical to integrate public records data, other insurance carrier data, and non-traditional sources, including law enforcement (narcotics, theft/burglary, and arrests), telephone subscribers, death indexes, and other referential sources.
Accessing billions of records from across potentially thousands of databases provides a challenging environment from which to conduct analyses. Making sense of all data can be overwhelming, especially given the amount of variation in content and representation. Technologies already exist to access, query, integrate, combine, and present data in these types of environments. The key will be to maintain a focus on the analytical methodologies used to determine the most important patterns. This will also involve leveraging the expansive amount of work already conducted in other domains, such as financial crimes, counter-terrorism, and law enforcement, and applying that to fraud.
© Arc, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to TMSalesOperations@arc-network.com. For more information visit Asset & Logo Licensing.