35 GenAI in Banking & Finance : Event-Driven Agentic AI

Event-Driven Agentic AI: A Practical Architecture for Financial-Crime Monitoring

Financial crime doesn’t happen in batches. It happens in real time, as a succession of payments, adjustments, and transfers — a continual stream of events that, to a human analyst, tell a story about behaviour. Traditional systems, however, often treat detection as a periodic job: run overnight, review the flagged cases the next day, rinse and repeat.

What if instead we treated every transaction as something worth reacting to instantly? What if our system were built not as a single monolithic workflow, but as a tapestry of independent, autonomous listeners — each attuned to specific signals in the data, capable of reasoning and explanation, and orchestrated not by a rigid workflow but by the very events they observe?

This is the promise of event-driven, agentic AI orchestration, and in the Python example we walk through here, you can see how a compact, multi-agent AI pipeline can detect suspicious fast cash-out activity — and generate human-readable explanations for alerts — without sprawling complexity.

At the highest level, the system answers a straightforward business question:

Is this account rapidly cashing out most of the funds it has received within a short time window — and if so, should we generate an alert?

In this architecture, the answer to that question does not live in one place. Instead, it emerges as a series of facts and reactions that unfold over time.


Strategy & Risk Lens

The code frames fast cash-out as a behavioural problem, not a transaction-level one. Individual payments are treated as neutral facts and published as events, allowing the system to build context over time rather than making premature decisions. As transactions accumulate, the last 30 minutes of activity are continuously recalculated, capturing the short-term exposure that risk teams care about most. Detection only happens once this context exists, applying a clear, auditable rule and producing a human-readable explanation for each alert. The resulting flow — payment → short-term exposure → explainable alert — mirrors how financial crime actually unfolds and enables real-time, defensible decision-making.


Link to Code: https://github.com/sujitpatange/Multiagent/tree/0f0593c011a5836f8ae40b43a2cac529ba5066bd/fraud_detection

From Stream to Insight: Why Event-Driven Makes Sense

The first key idea is that transactions should be treated as events, not records. Each meaningful occurrence — whether it’s a new payment, a recalculated balance window, or an alert trigger — becomes a fact that can be published, consumed, and reacted to by different parts of the system.

In the example codebase, this idea manifests in the events.py file: a set of domain-specific event classes coupled with a minimal in-memory event bus. Events like PaymentObserved, BalanceWindowUpdated, and FastCashoutAlertRaised are not commands telling the system what to do next — they are statements about what has happened. That difference is subtle, but profound. Facts are auditable, replayable, and independent of who reacts to them.

The event bus implements a basic publish/subscribe pattern: agents express interest in certain event types, and when those events are published, their handlers fire. For local development and small deployments, an in-process event bus keeps things simple and easy to debug; for larger systems, this mechanism can be replaced with Kafka, RabbitMQ, or a cloud pub/sub service without touching the agents themselves. This layer becomes the integration backbone — lightweight, decoupled, and flexible.

This approach naturally supports extensibility: new behavior can be introduced simply by subscribing to an existing event type. Want to push alerts into Slack? Write a new subscriber to FastCashoutAlertRaised. Want to record metrics in a time-series database? Subscribe to BalanceWindowUpdated. In this world, components don’t call each other directly — they react to the story of events as it unfolds.


Agents as Autonomous Actors

Once events are our lingua franca, the focus turns to the agents — self-contained units of logic that react to events they care about, maintain their own internal state, and emit new events when appropriate.

In the example, there are three agents.

The first, the IngestionAgent, serves as the connector from raw transaction feeds into the event-driven world. It takes basic transaction parameters — account ID, amount, direction, timestamp — and wraps them into a PaymentObserved event. There is no business logic here, no thresholds or risk decisions. Instead, this agent plays the role of an adapter, allowing core banking or payment systems to remain decoupled from the downstream detection logic.

This decoupling matters. In many institutions, payments can originate from multiple sources — core banking ledgers, batch ETL jobs, real-time switch feeds, third-party wallets — and forcing all of them into the same detection pipeline without an abstraction layer can lead to brittle integrations. Here, ingestion merely speaks the system’s event language and lets downstream agents make sense of it.

The second agent — the RollingWindowAgent — introduces temporal context. Financial risk often depends not just on what happens, but when it happens. For this reason, this agent subscribes to PaymentObserved events and maintains an in-memory rolling window of recent activity per account. As each payment arrives, it prunes entries older than 30 minutes, computes the total inbound and outbound volume over that period, and emits a BalanceWindowUpdated event containing these metrics.

From a domain perspective, this agent embodies how risk leaders think about exposure and velocity. It answers a simple analytical question in real time: “What has this account done in the last half hour?” By centralizing this temporal logic, the rest of the system can focus on decision-making without worrying about windowing, pruning, or state maintenance.

The third agent is the FastCashoutDetectorAgent — where business rules and AI reasoning converge. This agent listens for BalanceWindowUpdated events. When it receives one, it computes the outbound-to-inbound ratio for the current window. If there is no inbound activity, it quietly returns. If there is, it constructs a detailed prompt that includes:

  • The business rule (suspicious if ratio ≥ 0.8).

  • Multiple labeled examples to anchor the model.

  • The actual inbound, outbound, and ratio values from the window.

This prompt goes to a local LLM — hosted via Ollama with a reasonably capable model such as Gemma3:4B — and the model returns a simple JSON payload indicating whether the behaviour is suspicious and a human-readable reason. If the model’s verdict flags the case as suspicious, the agent publishes a FastCashoutAlertRaised event containing the ratio and the reason.

This integration pattern strikes a careful balance between deterministic logic and generative intelligence. The rule itself — the threshold — remains transparent and auditable. Compliance and risk teams can review it, tune it, and approve it. The LLM does not make the rule; it explains the outcome against a clearly defined rule, producing consistent, human-readable rationale that can be stored, presented to analysts, or surfaced in case-management systems.


Validating the Pipeline with Scenarios

A theory is only useful when it can be validated. That’s where the orchestration in main.py comes in. Rather than being a throwaway demo script, the simulation acts as a scenario-based test harness that serves both engineering and risk validation purposes.

At startup, the script creates the event bus and instantiates all agents: ingestion, rolling window, and detector. It also registers a simple alert handler that prints alert details so we can observe behaviour.

The simulation then walks through two contrasting scenarios.

In the first, an account receives 1,000 units of inbound funds (imagine a salary credit), and 20 minutes later initiates an outbound transfer of 900 units. At this point, the rolling window reflects an inbound volume of 1,000, outbound of 900, and a ratio of 0.9 — above the configured threshold. The detector builds its prompt, the model returns a “suspicious” verdict with a reason, and an alert event is raised and printed.

Functionally, this scenario mirrors the behaviour operators care about: rapid cash-out following a credit, which is often a red flag for fraud or mule activity. The inclusion of an explainable reason adds value to the bare alert itself, reducing the friction analysts experience when triaging cases.

The second scenario uses the same credit but splits outbound activity into smaller, benign transactions: 50 units at 10 minutes and 100 units at 25 minutes. The rolling window metrics here show inbound still at 1,000 but outbound only 150, yielding a low ratio that should not trigger suspicion. In this case, the detector still constructs its prompt, but the model returns a non-suspicious verdict, and the system produces no alert.

These paired scenarios serve as both demonstration and regression tests: they define expected system behaviour for both suspicious and normal activity, making it easy to validate that changes — whether to thresholds, prompt design, or model versions — have not unintentionally altered detection outcomes.


Beyond Fast Cash-Out: Extending the Pattern

What makes this architecture compelling is how easy it is to grow from here.

Because the system is built around events and autonomous agents, new behaviour can be added without rewriting core logic. Imagine a NotificationAgent that listens for alert events and pushes them to Slack or email channels. Or a CaseCreationAgent that writes alerts into a case-management system with associated context and rationale from the LLM. You could build an AnalyticsAgent that logs balance windows into a data warehouse for trend analysis, or new detectors subscribed to balance windows for other typologies like structuring or velocity spikes.

None of these require touching the ingestion or rolling window logic. They simply react to the events that are already published.


A Practical Architecture for Modern Risk Programs

In a world where financial services firms are increasingly exploring agentic AI to automate complex workflows — from customer service to compliance — this pattern offers a grounded, explainable, and extensible path forward. What we have described here is not a theoretic abstraction; it’s a pragmatic blueprint that can be implemented today with open-source tooling and local LLMs, keeping sensitive data inside an institution’s infrastructure while still harnessing generative reasoning.

The mantra is simple: let events describe what happened, and let agents independently decide what actions to take. It’s a model that aligns with how risk professionals think, how engineers build resilient systems, and how AI can be safely integrated into regulated environments.

If you’re exploring how to modernize financial-crime detection — whether for fraud, AML, or hybrid risk use cases — event-driven agentic AI offers a compelling architecture that blends real-time detection with explainable, audit-ready intelligence.

✍️ Author’s Note

This blog reflects the author’s personal point of view — shaped by 25+ years of industry experience, along with a deep passion for continuous learning and teaching.
The content has been phrased and structured using Generative AI tools, with the intent to make it engaging, accessible, and insightful for a broader audience.

Comments

Popular posts from this blog

01 - Why Start a New Tech Blog When the Internet Is Already Full of Them?

07 - Building a 100% Free On-Prem RAG System with Open Source LLMs, Embeddings, Pinecone, and n8n

19 - Voice of Industry Experts - The Ultimate Guide to Gen AI Evaluation Metrics Part 1