Explainable AI

XAI

Explainable AI (XAI) is a set of techniques and design principles that make AI model decisions understandable to humans, answering the question why a model produced a given output. In finance, it is essential for regulatory compliance, audit defensibility, user trust, and effective debugging of AI-native and agentic systems.

Key Takeaways

  • Explainable AI makes model decisions transparent enough for auditors, regulators, customers, and finance operators to understand and challenge.
  • Regulations including the EU AI Act, GDPR Article 22, and fair lending rules require meaningful explanations for many high-risk AI uses in finance.
  • Models sit on a spectrum from glass-box (linear regression, decision trees) through gray-box (gradient boosting with feature importance) to black-box (deep neural networks, LLMs).
  • Common techniques include SHAP values, LIME, attention visualisation, counterfactual explanations, and source citations for RAG systems.
  • AI-native AR platforms should expose confidence scores, source citations, feature attribution, and clear handoff points to human review when confidence is low.

What Explainable AI is and why it matters in finance

Explainable AI, often shortened to XAI, refers to the techniques and design principles that make the outputs of AI systems understandable to the humans who rely on them. The core question XAI tries to answer is simple: why did the model produce this output? In a finance context that question is rarely academic. A CFO who cannot explain why a customer was denied credit, why a payment was matched to a specific invoice, or why a cash forecast moved by ten percent has a real problem.

Four pressures push explainability up the finance agenda. Regulators increasingly require it, with the EU AI Act, GDPR, and fair lending rules all demanding meaningful information about automated decisions. Auditors require it, because SOX, internal audit, and statutory audit all need demonstrable controls over the systems that touch the ledger. Business users require it, because controllers and credit managers will not act on AI output they cannot interrogate. And engineering teams require it, because when an AI-native system goes wrong, root-cause analysis is impossible without visibility into model reasoning.

Spectrum: glass-box, gray-box, and black-box models

Not all models are equally opaque. The XAI conversation usually places models on a spectrum.

  • Glass-box models such as linear regression, logistic regression, and small decision trees are inherently interpretable. A human can read the rules and reproduce the decision by hand.
  • Gray-box models such as gradient boosting and random forests are not directly readable, but their outputs come with usable feature importance scores that show which inputs drove the result.
  • Black-box models such as deep neural networks and large language models contain billions of parameters and resist direct inspection. They require post-hoc explanation techniques layered on top.

More interpretable models are often less accurate than complex ones, which is the central trade-off finance leaders face when choosing between a transparent scorecard and a high-performing neural model.

Common XAI techniques

A handful of techniques dominate practical XAI work in finance:

  • Feature importance: a ranking of which inputs mattered most to a decision, useful for credit and risk models.
  • SHAP values: a game-theoretic method based on Shapley values that attributes contribution to each feature for an individual prediction.
  • LIME: local interpretable model-agnostic explanations, which approximate a complex model with a simpler one around a specific data point.
  • Attention visualisation: showing which input tokens an LLM or transformer focused on when generating its output.
  • Counterfactual explanations: statements of the form if X had been different by Y, the decision would have changed, which are intuitive for customers and regulators.
  • Citations and source attribution: for RAG systems, every claim in the output is linked to the underlying source document.

Use cases in AR: credit, cash application, dispute, and forecasting

In accounts receivable and order-to-cash, explainability is not a nice-to-have. It is the difference between a system the team trusts and one they bypass.

Credit decisions are the highest-stakes use case. When a customer is denied credit or assigned a tighter limit, the system should be able to show the top factors driving the decision and the counterfactual that would have produced a different outcome. This is a hard regulatory requirement in many jurisdictions.

Cash application benefits from confidence scores and matching signals. When AI matches a payment to an invoice, the user should see why: invoice number on the remittance, amount match, customer history, fuzzy text match on the reference field.

Dispute classification needs to explain why a deduction was coded as a pricing issue rather than a shortage or a promotional claim, so analysts can correct miscodes quickly.

Cash forecasting should expose which drivers most influenced a prediction, whether seasonality, a large customer payment delay, or a recent change in DSO.

LLM explainability and its limits

Large language models present the hardest XAI challenge in modern finance stacks. With hundreds of billions of parameters, no human can trace the exact reasoning path. The practical workarounds are chain-of-thought prompting, where the model writes out its reasoning in plain text, source citations from a RAG layer, and confidence scores. Each helps, but each has limits. A chain-of-thought trace is itself generated text and may not faithfully reflect what the model actually did internally. Citations only work when the retrieval layer surfaced the right documents. Confidence scores from LLMs are notoriously poorly calibrated.

The honest position for finance leaders is that LLM explanations are useful, auditable artefacts but not ground truth about model cognition. They are best paired with deterministic guardrails, human review thresholds, and structured logs that capture what was retrieved, what was generated, and what action was taken.

How to evaluate AI-native AR for explainability

When assessing an AI-native or agentic AR platform, finance leaders should look for six concrete capabilities.

  • Confidence scores on every automated decision, exposed in the user interface and in logs.
  • Source citations for any output produced by a RAG or LLM component, linking back to the underlying remittance, contract clause, or policy document.
  • Feature attribution for credit, risk, and forecasting models, so users can see the top drivers behind a score or projection.
  • Audit-friendly logs showing inputs, model version, decision, and rationale for every transaction, retained per your statutory requirements.
  • Clear handoff points to human review when confidence falls below a threshold, with the reason for escalation made visible.
  • Documentation aligned to emerging standards such as ISO 42001 and the EU AI Act risk classification framework.

Beware of explainability theatre, where vendors present polished but oversimplified or post-hoc rationalisations that satisfy a checklist without giving auditors or operators anything they can challenge. The right test is whether a controller, an external auditor, and a regulator can each get the answer they need from the same system, in language that fits their role.

Frequently asked questions

Is Explainable AI legally required for AR and credit systems?

For many high-risk uses, yes. The EU AI Act classifies creditworthiness assessment as high-risk and requires meaningful information about the logic involved. GDPR Article 22 gives individuals a right to meaningful explanation of automated decisions that significantly affect them. In the US, fair lending rules such as ECOA and Regulation B require adverse action reasons for credit denials. Even where not strictly required, internal and statutory auditors increasingly expect explainability documentation as part of SOX and SOC 2 reviews.

What is the difference between interpretability and explainability?

Interpretability usually refers to how well a human can directly understand a model's internal mechanics, which is high for linear models and decision trees. Explainability is the broader concept of being able to provide a human-understandable account of a decision, including post-hoc techniques layered on top of black-box models. In practice the two terms are often used interchangeably, but interpretability is closer to the model itself while explainability is closer to the user experience.

Can large language models really be made explainable?

Only partially. LLMs can produce chain-of-thought traces, cite sources retrieved through RAG, and emit confidence scores, all of which give auditors and users useful artefacts. But these are generated outputs, not direct windows into the model's billions of parameters, and they may not perfectly reflect the underlying computation. Treat LLM explanations as auditable evidence paired with deterministic guardrails and human review, not as ground truth.

What are SHAP and LIME, and when should we use them?

SHAP values use a game-theoretic approach to attribute each feature's contribution to a specific prediction, giving consistent and theoretically sound per-feature importance. LIME approximates a complex model with a simpler one around a single data point to explain that prediction locally. SHAP is the more common choice for credit scoring and risk models where you need defensible per-decision attribution. LIME is useful for quick local explanations and works with almost any model type.

How does Explainable AI relate to AI Governance?

AI Governance is the broader framework of policies, controls, roles, and processes that ensure AI systems are used safely, ethically, and in line with regulation. Explainable AI is one of the technical capabilities that governance frameworks require. You cannot meaningfully govern an AI system you cannot explain, and most governance standards including ISO 42001 and the EU AI Act treat explainability as a foundational requirement for high-risk systems.

What is explainability theatre and how do we avoid it?

Explainability theatre is the practice of showing polished but oversimplified or post-hoc rationalisations that satisfy a compliance checklist without giving operators, auditors, or regulators anything they can actually challenge. To avoid it, demand that explanations are reproducible from logs, that confidence scores are calibrated against real outcomes, that source citations resolve to actual documents, and that the same system can answer questions for a controller, an external auditor, and a regulator in language appropriate to each.

Continue learning