Connecting the Dots of Insurance Fraud Using Graph Analytics
Radost Wenman

Connecting the Dots of Insurance Fraud Using Graph Analytics

The insurance industry has been fighting its war on fraud ever since the first insurance policies were written in the 1800s. One important feature of that war has been change. Fraudsters have continually evolved their methods, innovating ways to elude insurers’ anti-fraud efforts and remain under the radar. This capacity for evolution and innovation makes it of utmost importance for companies to leverage the latest and most effective technologies to detect and catch fraud in their claims repositories.

In The State of Insurance Fraud Technology (2019), the most recent report published by the Coalition Against Insurance Fraud (CAIF) and the SAS Institute, nearly 75% of survey participants experienced a rise in fraudulent claims in the past three years. This represents a disturbing 11% increase since 2014. The survey was based on 84 primarily property and casualty insurers. None of the participating insurers indicated that fraud had decreased significantly during the same time frame. As a consequence of the rise in fraudulent claims, insurers are moving away from traditional formulaic business rules and red flags for identifying fraud. Instead, insurers are enriching their data analytics arsenals to include more sophisticated tools, methods and means to investigate fraud.

It should be noted that the jump in fraud occurrences may be due partly to insurers having become more proactive and expert at detecting fraud. But, as the CAIF’s report clearly indicates, it is without a doubt that fraud continues to exert pressure on insurers’ bottom lines. More specifically, the FBI estimates that the monetary impact of fraud on the insurance industry in the U.S. (excluding health insurance) amounts to more than $40 billion per year. That cost is passed on to consumers, who bear the brunt of covering the cost of fraud with higher premiums, on average an extra $400 to $700 per year.    

The challenges of analyzing insurance fraud arise from the complex nature of insurance data, which can be highly variegated, and may be either structured or unstructured. Analysis may be complicated, as well, by the high likelihood of traditional, less efficient data analytics producing too many false positives. But newer, cutting-edge technologies have emerged that can operate on data with different lenses. Rather than purely clustering or predicting anomalous observations, these technologies provide new dimensionalities by visualizing structure and relationships within the data. 

Graph theory, also known as link or network analysis, is a mathematical discipline, representing a veritable example of such a progressive technique. Graph theory has a proven track record as a mainstay in data science, and insurers can greatly benefit from leveraging this technique to “picture” their data and draw new types of insights.

In most basic terms, a graph is a network of interconnected objects. Examples from real-world applications include financial systems connecting banks across the globe, social networks of individuals associated as friends, customers linked in a system of transactions, biological networks of protein-protein interactions and power grid networks. These interconnected systems can be readily represented as graphs made up of vertices, also known as nodes (the objects) and edges (the connections between objects). As a result, graph analytics can be particularly useful in analyzing the system’s structure to detect groups of nodes that share common characteristics or detect anomalies that could lead to system failures (e.g., diseases related to specific proteins, blackout susceptibility of power grids, or significance of customers’ opinions).

An important application of graph theory in the insurance domain is graph-based anomaly detection. Graph-based anomaly detection consists of identifying clusters of people, places, and events that share common features but as a whole exhibit behavior that is incompatible with the normal patterns of the remainder of the system and are consequently suspicious. Popular graph theory algorithms that address the problem of fraud detection include random walk analysis, minimum description length and multi-dimensional tensors. 

In the context of insurance fraud, the anomalous behavior could be a fraud ring, where several entities join forces to submit false claims and scam the insurance company. An example of a fraud ring could be a doctor, several lawyers, and their clients who file multiple small claims over time. Such small claims can be difficult, if not impossible, to detect with traditional tools, which cannot uncover the connections between the various players appearing in multiple instances of the fraud scheme. Graph analytics, however, can be an effective technique to connect and discover this type of collusive behavior among fraudsters.

A typical example observed in personal auto insurance may involve multiple providers (doctors, lawyers and body shops) and multiple participants (drivers, passengers, pedestrians and witnesses) who participate in several staged accidents and claim soft tissue injuries. Such injuries, which can be easily falsified but are expensive to treat have appropriately acquired the term “whiplash for cash.” In the multiple-accident scenario, a particular individual could pretend to be the driver in one accident but the passenger or pedestrian in another. 

We can imagine a scenario in which six people collude to stage three accidents, with each person acting once as the driver and twice as the passenger. Hypothetically, if the average claim is $20,000 per person and $5,000 per car, this scenario could potentially result in $390,000 in total damages claimed. In a second, slightly more involved scenario, each of 10 people act once as the driver, and three times as the passenger and witness. Assuming similar amounts of payouts, this case could result in $850,000 of total damages claimed. In such cases, the involved parties may also share the same doctor and lawyer who will attest to the injuries and represent the persons. 

The first graphic below depicts the specific relationships in the single fraud cluster of the first scenario. The second image provides a second example of a graphical visualization – this time of an entire real-word health-care dataset - in which the nodes represent doctors (red nodes) and pharmacies (blue nodes). The graph reveals clusters that are connected together by multiple narcotic transactions. The prominent clusters could point toward instances of relationships involved in suspicious prescription practices.

The examples above demonstrate how powerful graph analytics can be over traditional manually-intensive methods in the quest to thwart insurance fraud. While each individual claim or transaction can appear genuine and lawful on its own, when analyzed in the broader context of the network and its relation to other transactions, the hidden patterns of fraud become visible.



Graph analytics presents the next frontier of scientific approaches to help insurers detect novel fraud schemes by analyzing the multi-layered links among data objects. Even though carriers continue to rely on automated red flags and business rules, the CAIF’s survey indicated a significant increase in the number of carriers planning to deploy various anti-fraud technologies in 2019 compared to previous CAIF surveys.

It is noteworthy that link and network analysis constitutes the second most common approach to fraud detection after rule-based and red flag approaches. The transition from business rules toward automation is gradual, but it is clear that insurers are trying to capitalize on innovation to enhance their know-how of identifying and combating insurance fraud.


  6. Pitas, Ioannis. Graph-Based Social Media Analysis, Chapman and Hall/CRC, 2015.

Radost Wenman, FCAS, MAAA, CSPA, is a Consulting Actuary with Pinnacle Actuarial Resources, Inc. in the San Francisco, California office. She holds a Master of Science degree in Statistics and a Bachelor of Science degree in Mathematics from Stanford University. Radost has 14 years of experience in the Property and Casualty Insurance arena, focusing on pricing and product development. In this role, she has developed homeowners and private passenger auto pricing solutions through the design and implementation of advanced predictive models.

«April 2021»