digital data sorting above computer
Blog Post

BIFSG for Race Imputation in Colorado: Does it Pass the Test?

5 minutes

The Colorado Division of Insurance (DOI) recently issued an unprecedented regulation requiring life insurance companies to assess the potential for unfair discrimination within company data and models. The Colorado regulation aims to address potential racial bias in life insurance underwriting and pricing by requiring insurers to explicitly consider the potential impact of several protected characteristics, including race, on their underwriting and pricing decisions. The regulation’s purported goal is to ensure that all consumers have access to fair treatment in the insurance market by requiring insurers to analyze their underwriting and pricing systems for potential racial disparities.

Since insurance companies do not collect racial data directly from applicants, the DOI mandates the use of the Bayesian Improved First Name Surname Geocoding (BIFSG) method to estimate the race and ethnicity of policyholders. This requirement bridges the data gap to allow insurers to analyze their data and models for potential unfair discrimination, even without direct racial data.

While this initiative aims to address potential racial bias in underwriting and pricing, concerns have been raised about the limitations of the BIFSG method. Critics argue that BIFSG may not accurately reflect an individual's true race or ethnicity, potentially leading to misleading results and an overstatement of disparate impact. This is because BIFSG relies on assumptions about the relationship between names, locations, and race, which can be inaccurate.

The BIFSG method, and its predecessor, BISG (Bayesian Improved Surname Geocoding), to be collectively referred to here as BI(F)SG, belong to a class of proxy methods used to predict a sensitive attribute, such as race, from other available data. Methods to infer race/ethnicity based on first name, surname and geolocation have been proposed in the US to assess disparities in domains such as transportation, finance and healthcare. BI(F)SG uses a calculation based on Bayes Theorem and counts from the Census Bureau’s database to predict an individual’s race, based on their first name, surname and address. Notwithstanding the widespread use of BI(F)SG, its application is not without controversy, and empirical studies have shown that BI(F)SG leads to overestimates of disparate impact and have highlighted the challenge of using such methods confidently in practice.

The use of proxy methods to assess unfair discrimination for groups with protected characteristics remains controversial and under-researched. While advanced methods like BI(F)SG show promise, various studies2,3,4,5 indicate they can lead to inaccurate assessments because of the interconnectedness of lending outcomes, location, and race. However, a comprehensive understanding of the limitations of proxy methods and how to address these biases is still lacking. This gap in knowledge is crucial to address, considering the widespread use of proxy methods and the significant impact of disparity assessments on both management and decision making.

The raw output BI(F)SG generates is a score between 0 and 1, representing the likelihood of a borrower belonging to a particular race or ethnicity. These probabilities for all racial/ethnic groups add up to 1. Fair lending assessments aim to compare lending outcomes between a protected group (e.g., a specific race) and a control group. To do this, borrowers must be definitively assigned to a single race/ethnicity. This requires converting the continuous BI(F)SG probability into a binary classification (e.g., belonging to a specific race or not). Two primary approaches have been used to achieve this transformation:

  1. Fixed Threshold: The BI(F)SG probability is categorized into a binary classification using a predetermined cutoff value.
  2. Continuous Probability: The continuous BI(F)SG probability is used directly without further categorization.

It has been shown that the continuous BI(F)SG is susceptible to estimation bias for two key reasons. Firstly, the population used to develop the reference databases for first names, surnames and geographic locations may not accurately reflect the specific population of borrowers for a particular credit product. Secondly, the calculation of BI(F)SG probabilities assumes conditional independence that may not hold true in reality.

Since the fixed BI(F)SG approach relies on the continuous BI(F)SG probability as input, any biases present in the continuous probability can propagate into misclassifications within the fixed approach. Furthermore, the fixed method itself introduces potential for additional misclassifications. This approach necessitates the use of a threshold to categorize the continuous probability into a binary classification. While a higher threshold reduces the likelihood of false positives, it increases the risk of false negatives. If the threshold is too high, a borrower might not be assigned to any of the race/ethnicity groups. Conversely, a lower threshold can lead to an increased risk of false positives and potentially assign borrowers to multiple racial/ethnic groups simultaneously.

In the insurance context, the Society of Actuaries (SOA) published a research paper in April 2024, Statistical Methods for Imputing Race and Ethnicity, describing a range of techniques for developing probabilistic estimates of race/ethnicity, including BI(F)SG, and comparing them with a focus on relative accuracy and potential bias. The SOA makes the following additional observations regarding the limitations of the BI(F)SG method:

  • The exact counts for infrequent surnames in some cohorts are not available or are suppressed altogether due to privacy concerns; these omitted or suppressed surnames may impact less represented race/ethnicity cohorts more than others.
  • The methodology performs poorly for identifying self-reported American Indian/Alaska Native and multiracial individuals.
  • A notable difference between the calibration curves for BIFSG compared to BISG is on the API cohort, where the BIFSG is showing a clear tendency to understate the size of the population. This demonstrates a deterioration in performance moving from BISG to BIFSG for this cohort.

In conclusion, while the BI(F)SG approach offers a potential avenue for assessing disparities in insurance outcomes, its application requires careful consideration. The inherent limitations of proxy-based methods, including potential biases and the lack of granular data, necessitate a thorough evaluation before widespread adoption. It is essential to prioritize independent research and pilot programs to assess the accuracy and effectiveness of BI(F)SG within the insurance context. This cautious approach will ensure that any decisions made regarding the use of BI(F)SG are data-driven and minimize the potential for unintended consequences.

References:

  1. Kallus, N., Mao, X., & Zhou, A. (2019). Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination. arXiv preprint arXiv:1906.00285.
  2. Baines AP, Courchane MJ (2014) Fair lending: Implications for the indirect auto finance market.  URL https://www.crai.com/insights-events/publications/fair-lending-implications-indirect-auto-finance-market/.
  3. Charles River Associates. https://afsaonline.org/portals/0/Federal/Issue%20Briefs/Study%20Key%20Findings.pdf
  4. Zhang Y (2016) Assessing fair lending risks using race/ethnicity proxies. Management Science 64(1):178–197.
  5. Chen J, Kallus N, Mao X, Svacha G, Udell M (2019) Fairness under unawareness: Assessing disparity when protected class is unobserved. Proceedings of the Conference on Fairness, Accountability, and Transparency, 339–348 (ACM).
  6. SOA. (2024). Statistical Methods for Imputing Race and Ethnicity.

News & Insights