Deconstructing the Data Dilemma in Banks: Banks have a false sense of having data

Hi all,

Today we are starting our series of articles about data usage, AI/ ML modeling, and analytics in banking.

First one is going to be about issues with data availability. So, let’s start.

In today’s banking sector, institutions may find themselves perched upon a false sense of data availability. Often, they may believe they possess ample data for loan portfolio analysis processes, decision-making, and machine learning (ML) models when in reality, a significant portion of this data remains inaccessible or in a format incompatible with their systems. This common misconception is a critical roadblock to improved data-driven strategies and predictive analytics.

Problem: The Illusion of Abundant Data

When banks receive data into their internal production systems, it typically arrives in a ‘raw’ format, such as XML or JSON data. This format is inherently non-tabular and, therefore, not immediately usable for most analytical or ML usecases as the vast majority of ML models and risk analysis tools require input data to be in a tabular format. Furthermore, the data must maintain consistency over time. A system that retains specific data for only a month and lacks it for historic periods is useless for these models.

Unfortunately, the data that enters production systems is often not saved fully – only some parts used by the production systems are retained, while the rest can be discarded.

This situation can occur due to miscommunication between IT infrastructure teams and data scientists or risk analysts. In more forgiving circumstances, some of the data response may be parsed and stored in a format usable for later analytical processing while the remaining data is relegated to a server, often lost to the data science team.

This mismatch between data formats and data availability is limiting banks ability to improve models without expending significant time and effort to retrieve additional data.

Solution: Harnessing Integrated Data Transformation Tools

To combat this issue, banks can utilize integrated data transformation tools to convert incoming data into a more usable format. These tools should be easily accessible to ML and risk teams, thereby lowering the costs associated with creating new features. Ideally, they should be separated from underwriting tools to enable the bank to store more features than it uses in production. This separation facilitates the pushing of extra data into the machine to find new patterns, enhancing the capability to adapt and evolve models as needed.

Pros and Challenges

The implementation of integrated data transformation tools provides several distinct benefits. First, it eliminates the barrier between the data and those who use it, simplifying the process of generating new features. It also allows for more substantial data storage, leading to a richer data environment and, consequently, more robust models.

Moreover, this approach enables banks to take full advantage of their data by utilizing ML and risk teams to find and leverage new patterns. In turn, this enhances predictive analytics capabilities and improves overall business intelligence.

However, there are challenges associated with the integration of data transformation tools. For one, the transformation process may initially be time-consuming and tedious, requiring banks to invest in training and tool development. Furthermore, there may be resistance from different teams within the organization to change established processes and adapt to this new system. The separation of these tools from existing underwriting tools may also add to the complexity, potentially leading to initial hiccups in the integration process.

Despite these challenges, the benefits of data transformation tools are considerable. As the banking industry continues to recognize the value of data-driven strategies, these tools provide an effective solution to the prevailing data illusion problem, leading to improved decision-making, risk management, and ultimately, superior customer service and loan portfolio analysis.

The next two articles are about Feature Store Technology and Data Censoring in Credit Analytics.

Stay tuned.