36 GenAI in Banking & Finance : Consensus & Adversarial Panels for Financial Crime Investigations
Consensus & Adversarial Panels for Financial Crime Investigations
In last two posts we discussed Supervisor-based agent hierarchies & event-driven orchestrations of multi agent. Event-driven architectures work well for transaction screening and alert generation. Supervisor-based agent hierarchies work well for deterministic workflows. Financial crime investigations, however, frequently require subjective judgment under regulatory scrutiny. Decisions must be explainable, defensible, and reproducible months or years after the fact.
In these scenarios, a single LLM agent is insufficient. Different stakeholders interpret the same facts differently, and regulators expect evidence that competing viewpoints were considered. The Consensus & Adversarial Panel pattern addresses this by running multiple independent agents on identical inputs, aggregating their conclusions mathematically, and applying a critic agent to evaluate the quality of agreement.
The Financial_Crime_Multi_Agent.ipynb notebook implements this pattern end-to-end using Ollama with the gemma3:4b model. Four peer agents analyze the same normalized case data, their outputs are aggregated into a single decision with explicit consensus metrics, and an adversarial critic determines whether the result is strong enough to stand or should be escalated for human review.
Problem Context
Financial crime decisions often sit at the intersection of compliance, commercial risk, customer behavior, and legal interpretation. A typical high-risk case might involve a large international transfer routed through layered holding companies, partial exposure to a politically exposed person, existing alerts, and meaningful revenue impact.
When analyzed by different roles, the same case can yield conflicting conclusions. A compliance-oriented perspective may focus on jurisdictional risk and escalation thresholds. A business or risk perspective may emphasize legitimacy and revenue contribution. A legal perspective may focus on SAR triggers and precedent. The challenge is not choosing one opinion, but producing a decision that reflects all of them in a structured and auditable way.
Consensus & Adversarial Panel Pattern
The panel pattern treats decision-making as a controlled form of disagreement. Each agent analyzes the same case independently. No agent has visibility into the others’ conclusions. Their outputs are then aggregated using simple, explicit rules. A separate critic agent reviews both the individual rationales and the aggregated outcome to identify weak consensus, shallow reasoning, or contradictions.
The result is a final decision accompanied by all intermediate artifacts: the normalized input, each agent’s reasoning, the vote distribution, the computed consensus level, and the critic’s assessment.
System Architecture
The implementation is structured as a deterministic pipeline. Raw case data is first normalized into strongly typed data models. The normalized case is then passed to multiple panel agents in parallel. Each agent produces a structured JSON response. These responses are aggregated into a final recommendation and risk score. The critic evaluates the quality of the panel output. The full result is returned as an audit-ready object.
Data Normalization
The system begins by converting raw case inputs into structured dataclasses. Transactions, customer profiles, and case metadata are represented explicitly. This guarantees that every agent receives identical information and that any divergence in output is attributable to agent reasoning rather than data inconsistencies.
Normalization serves two purposes. Technically, it enforces schema consistency and predictable serialization into JSON prompts. Functionally, it ensures that investigators and auditors can verify that all agents evaluated the same factual record.
LLM Interface
All agents use a shared Ollama client configured for the gemma3:4b model. The client enforces low-temperature generation to reduce variance and simplify downstream parsing. The only difference between agents is the system prompt that defines their persona and analytical bias.
This approach avoids model sprawl while still producing diverse reasoning paths.
Agent Abstraction
Each panel agent is an instance of a common BaseAgent class. The agent receives a normalized case, constructs a prompt containing the full case data, and requests a structured JSON response containing a rationale, a numeric risk score, and a binary recommendation.
The abstraction isolates persona differences to prompt design rather than code logic. Error handling defaults to conservative outcomes if parsing fails, which is consistent with financial crime risk management expectations.
Panel Composition
The notebook defines four agents representing distinct stakeholder viewpoints. The compliance agent applies strict AML and KYC interpretations. The risk appetite agent balances regulatory exposure against business impact. The customer context agent evaluates historical behavior and relationship consistency. The legal and regulatory agent focuses on jurisdictional rules and reporting thresholds.
An odd number of agents is not required because the system does not rely solely on binary voting. Disagreement is explicitly measured and preserved rather than hidden.
Aggregation Logic
Agent outputs are aggregated using simple, transparent mathematics. Risk scores are averaged. Recommendations are resolved via majority vote. Consensus is expressed as the proportion of agents agreeing with the majority outcome.
This produces a decision that can be stated quantitatively, such as a final risk score of 75 with a consensus level of 0.75. These metrics can be logged, trended, and reviewed over time.
Adversarial Critic
The critic agent receives the full case data, all individual agent responses, and the aggregated result. Its role is not to re-decide the case, but to evaluate the quality of the panel decision. It looks for weak rationales, internal contradictions, and cases where consensus is mathematically strong but conceptually fragile.
If the critic identifies sufficient issues, the case is flagged for manual review. The critic’s output becomes part of the permanent record.
Example Execution
In the sample case included in the notebook, three agents recommend escalation with high risk scores while one agent recommends clearance based on customer history. The aggregator produces a suspicious outcome with a 75 percent consensus. The critic acknowledges the strong majority but flags the minority rationale as underdeveloped, recommending human validation.
This demonstrates that consensus alone is not treated as sufficient. Reasoning quality matters.
Properties of the Design
From a technical perspective, the system emphasizes structured inputs and outputs, deterministic aggregation, modular agent composition, and local execution for data privacy. From a functional perspective, it produces regulator-ready decision provenance, aligns competing stakeholder views, and enables targeted human escalation rather than blanket review.
Applicability
This pattern is well suited for complex approvals, investigation closures, PEP-related decisions, and situations where regulatory “four-eyes” principles apply. It is not intended for high-volume screening, simple rules-based alerts, or ultra-low-latency decision paths.
Conclusion
The Consensus & Adversarial Panel pattern treats financial crime decisions as governance problems rather than pure prediction tasks. By explicitly modeling disagreement, aggregating it transparently, and critiquing the result, the system produces decisions that are explainable, auditable, and aligned with real investigative workflows.
✍️ Author’s Note
This blog reflects the author’s personal point of view — shaped by 25+ years of industry experience, along with a deep passion for continuous learning and teaching.
The content has been phrased and structured using Generative AI tools, with the intent to make it engaging, accessible, and insightful for a broader audience.
Comments
Post a Comment