Financial services is one of the best-positioned industries to leverage all the benefits AI has to offer. From underwriting to fraud detection, FIs are data-rich
and driven by document-intensive processes, precisely the kind that AI excels at streamlining.
It’s no surprise, then, that AI investment across banking, insurance, capital markets, and payments is projected to reach
$97 billion
by 2027. The shift is already well underway: 59%
of bank employees use AI tools in their daily work.
But history has taught us that technological leaps, when used without foresight, can be a dangerous game. The
algorithmic
trading boom of the 1980s and 1990s saw automated systems promise faster trades, greater efficiency, and higher profits—and they delivered. But it also introduced a new kind of instability.
High-frequency trading led to flash crashes and liquidity gaps, a byproduct of a model designed for speed over control.
The same risk looms over AI in financial services. The allure of efficiency, cost savings, and improved customer experience is real. But rushing forward
without proper frameworks could expose fintechs to faulty outcomes, reputational damage, or regulatory intervention.
Where AI fails
Twenty years ago, people may never have guessed that today in banking, life-changing decisions like approving mortgages or setting credit limits would be
decided by computers and algorithms. Yet a warning buried in a 1979
IBM manual feels more relevant than ever: “A computer can never be held accountable, therefore a computer must never make a management decision.” But today, fintechs are testing that boundary.
Biased decision-making
AI is increasingly being used for lending decisions. Research has shown that up to
40%
of loan approvals now use AI analysis. However, AI is only as unbiased as the data it’s fed, which is fundamental when applying it to decision-making. If a model learns from historic patterns that are themselves incomplete or discriminatory, financial institutions
can quickly find themselves automating unfairness at scale.
Lack of human governance
Fraud detection is another area where leaders might overestimate AI’s capabilities. AI is exceptionally fast and can handle enormous volumes of transactions,
but fraudsters are using AI as well to bypass detection. Deloitte predicts that GenAI could enable fraud losses to reach
$40
billion in the U.S. by 2027. When teams over-rely on automation, it dulls human vigilance. And without clear oversight and escalation triggers, it delays detection of new schemes that fall
outside existing patterns.
Predicting unexpected anomalies
Risk modeling may be where over-reliance is most dangerous. AI does a fine job of spotting trends in historical data, as it performs well with known patterns—but
most financial shocks don’t resemble historical data.
A stark example: during the COVID-19 pandemic, around
35%
of U.K. banks reported a decline in machine learning model performance. This was because the pandemic triggered an unprecedented economic disruption, one that couldn’t be forecasted by models trained on historical predictors or conventional macroeconomic data.
AI is a pattern recognition tool, not a prophet, and fintech leaders must understand where AI’s accuracy ends and human oversight begins. In a sector where
transparency is non-negotiable, AI and human-in-the-loop processes become essential, not optional.
Building guardrails for escalation
AI can outperform humans, completing complex tasks in mere seconds. But that doesn’t mean it should work without human oversight. AI is smart but not infallible.
There should always be a mechanism for human oversight when decisions directly affect a person’s financial well-being. As the IBM manual stated, computers
can’t be held accountable. And when decisions are based on probabilistic AI models—as is often the case in financial services—negative outcomes are always possible, whether the decision-maker is a human or a machine.
Therefore, fintech leaders must focus on
human-in-the-loop designs
and escalation frameworks to challenge biased decision-making, blind spots, and increase oversight. That means setting decision thresholds, such as confidence scores or counterintuitive outputs, as triggers for human review. If AI returns low confidence scores
or denies a long-term customer a small credit increase, that’s a signal to escalate.
For example, bias can appear subtly through
proxy
variables. If an AI system used an applicant’s email domain and phone model to infer income or education when reviewing a loan application, that’s correlation masked as insight. These proxies
frequently reflect socioeconomic biases, not true creditworthiness. A human-in-the-loop system should assign a low confidence score here, alerting a human reviewer to step in and override.
For unexpected anomalies, like economic shocks, escalation frameworks must include monitoring for AI drift. If a system starts delivering inconsistent results
across similar cases, especially during volatile periods, that’s a red flag. This indicates human intervention is needed and potential retraining of the model.
Escalation shouldn’t be seen as a failure; AI can assist, but it can’t self-correct bias, validate fairness, or anticipate the next
black swan.
These frameworks build accountability and keep regulatory risk in check. A well-governed system knows when to defer to human judgment to ensure decisions remain contextual and transparent.
Balancing speed with scrutiny
Fast-scaling fintechs naturally want to move quickly, but they must build explainability into the process, not bolt it on later. Fintechs can’t build trust
on “ethical debt” and should incorporate explainability tools to understand how AI models make decisions. These not only improve regulatory compliance but help communicate decisions and build customer trust.
One tool is the SHapley Additive exPlanations (SHAP.) This helps detect and mitigate bias in AI decision-making by assigning each factor a score on how
much it contributed to a decision. If AI denies a loan, SHAP can show how much income or employment status influenced the outcome. If unexpected factors rank highly, like address, that may point to bias that needs to be reviewed.
Local Interpretable Model-Agnostic Explanations (LIME) supports human oversight by building a simple model for each individual prediction. Teams can then
explain to customers a loan application was flagged, for instance, and share that a recent late payment influenced the decision. This model creates accountability and helps humans validate whether the model’s reasoning aligns with policy and fairness goals.
Counterfactual analysis helps address unexpected anomalies by asking: What’s the smallest change that would have led to a different decision? For example,
had this applicant earned $5,000 more, the model would have approved the loan. But if the answer is unclear, this could point to model instability, especially during economic shocks.
Meanwhile, regulations like the
EU AI Act
are institutionalizing the need for these safeguards. Fintechs using high-risk AI systems—like those used in credit decisions—must document model logic, monitor for drift, and allow human appeal. The law reinforces the need for fintech to manage bias, maintain
oversight, and catch anomalies before they scale.
In financial services, the danger lies in fintech leaders treating AI as a decision-maker rather than a decision-support tool. But those who use AI responsibly,
blending innovation with a willingness to intervene and question, will be able to scale safely, building trust with customers and regulators.