Addressing Data Censoring in Credit Analytics and Model Building

Hi all,

Nice to see you again.

In this topic, we would like to start with an example to describe the problem of loan portfolio analysis better.

Problem example:

Data censoring, or the process in which some data is not available or biased, presents a significant problem for businesses across various sectors. This issue becomes highly prevalent when utilizing data for analytics and model building, where the need for accuracy and complete information is crucial.

Let’s take a critical example from the world of credit applications. In this context, a significant number of customers are rejected during the application stage. The absence of outcomes or labels for these applications means that businesses cannot directly use this valuable application data for building an acquisition model or testing out strategies.

This problem equates to predicting an alternate reality, a task that proves complex and challenging. It is comparable to pondering whether one could pass a driving test the first time with more classes. Without the ability to revisit the past, it’s nearly impossible to answer such questions definitively.

Proposed Solutions:

Controlled Testing:

The ideal strategy to address this problem is through a carefully planned and executed experiment. For instance, in the driving test example, individuals with equally low driving aptitude can be grouped and exposed to varying conditions such as frequency of classes per week. By measuring the outcomes and controlling other factors, one can assess the impact of more lessons.

Translating this to credit applications would necessitate running a controlled test where a small percentage of marginally declined applicants are approved. This strategy can accurately measure the performance of these normally declined customers and assess the effectiveness of your strategies.

However, this approach comes with significant drawbacks. It can be costly and necessitates robust control mechanisms. Additionally, it’s time-consuming as the outcomes of these tests require a significant period to assess. This approach works well for optimizing existing processes and policies but falls short when decisions on something new are required.

Historical Data Analysis:

In case controlled testing is unfeasible, historical data analysis becomes an option. This process involves examining outcomes from past scenarios that align with the current situation. It essentially involves sourcing a database of past instances, such as driving tests, and assessing the impact of various factors on success rates.

This approach is particularly beneficial in the context of credit applications. Credit bureaus essentially offer this service, making it a practical approach to consider. However, it isn’t without its challenges. Finding historic records that match your current situation perfectly can prove extremely challenging. For example, if a marketing campaign encourages applicants with poor credit scores to apply with promises of no late payment fees, the application base will inherently be riskier. Thus, you can’t directly add external performance data to your application model development dataset.

As such, while this problem presents a considerable challenge, it doesn’t have a definitive solution. The optimal approach would be to maintain flexibility and utilize available tools effectively. Businesses should be open to using different techniques and methodologies based on the specific situation and available resources. This can significantly aid in overcoming the problems presented by data censoring, helping businesses build more robust and effective models and strategies.

Summary

Data censoring is a significant issue in the realm of analytics and model building, particularly in the context of credit applications. Not having the outcomes for rejected applications can restrict the potential of building effective acquisition models or strategies. However, solutions exist that can help overcome this problem:

Controlled Testing:

Pros: This involves running a controlled experiment, like approving a small fraction of marginally declined applications. It can measure the performance of normally declined customers and help assess the effectiveness of the strategies.
Challenges: This approach can be costly and requires robust control mechanisms. It’s also time-consuming as the outcomes of these tests need time to fully manifest and evaluate. It may not be applicable when dealing with new situations or policies that differ significantly from the tested scenario.

Historical Data Analysis:

Pros: This process involves analyzing outcomes from past scenarios similar to the current situation. It can offer insight into how different factors influenced past outcomes, providing a basis for future decisions.
Challenges: Finding historic records that perfectly match your current situation can be difficult. Additionally, external variables, like marketing campaigns, can significantly alter the risk profile of the application base, making it challenging to directly add external performance data to your application model development dataset.

The ultimate solution would involve flexibility and effective utilization of available tools. Businesses should be ready to adapt their methodologies based on the situation and available resources to overcome the problems presented by data censoring.