22 GenAI in Banking & Finance: Post 2 100% On-Premise, 100% Open-Source Sanctions Screening and Fraud Detection Pipeline with AI-Powered Explanations
100% On-Premise, 100% Open-Source Sanctions Screening and Fraud Detection Pipeline with AI-Powered Explanations
In the first post of this series, we explored how Generative AI (GenAI) is redefining fraud detection — shifting from static, rule-based filters to dynamic, context-aware intelligence. Today, let’s go deeper with a hands-on example: building a fraud detection & Sanction Screening pipeline that not only detects issues, but also explains them using GenAI.
We’ll extend the classic anomaly detection setup by incorporating real-world financial risk checks:
-
Geographical risks (location mismatches)
-
Transaction types (unusual behavior like structuring/cash layering)
-
Product types (cash-intensive or high-risk financial instruments)
-
Counterparty risks (sanctions/PEP)
Most importantly, we’ll connect this to a live sanctions list — the OFAC Specially Designated Nationals (SDN) List published by the U.S. Treasury — to demonstrate how banks can operationalize compliance checks in real time.
Why This Matters
In today’s complex financial landscape, organizations face significant challenges detecting suspicious transactions that may indicate fraud or involvement with sanctioned entities. Traditional rule-based checks often fall short when dealing with ambiguous or evolving risk patterns. This blog post presents a unified pipeline that combines fuzzy sanctions screening, contextual anomaly detection, and generative AI explanations for enhanced financial crime investigation.
Problem Statement
A Unified AI-Powered Pipeline
1. Fetching and Preparing the OFAC SDN Sanctions List
import io import requests import pandas as pd OFAC_SDN_URL = "https://www.treasury.gov/ofac/downloads/sdn.csv" def fetch_ofac_sdn(url: str = OFAC_SDN_URL) -> pd.DataFrame: resp = requests.get(url, timeout=30) resp.raise_for_status() data = resp.content.decode("utf-8", errors="ignore") df = pd.read_csv(io.StringIO(data), header=None) df["Name"] = df[1].astype(str).str.upper().str.strip() sanction_list = set(df["Name"].unique()) sanction_df = pd.DataFrame(sanction_list, columns=['Name']) return sanction_df
Explanation:
This function dynamically fetches the latest Specially Designated Nationals (SDN) list from the US Treasury website. It reads the CSV response, converts all entity names to uppercase for case-insensitive matching, strips whitespace, and removes duplicate entries. The resulting DataFrame provides a clean, standardized sanction list to be used downstream.
2. Fuzzy Sanctions Screening with Name Normalization
import re from rapidfuzz import fuzz, process def normalize_party_name(s: str) -> str: s = (s or "").upper() s = re.sub(r"[^\w\s\-&]", " ", s) s = re.sub(r"\s+", " ", s).strip() return s def check_sanctions_fuzzy(counterparty: str, sdn_df: pd.DataFrame, threshold: int = 85): if not counterparty: return False, None, 0 query = normalize_party_name(counterparty) choices = sdn_df["Name"].tolist() match = process.extractOne(query, choices, scorer=fuzz.token_sort_ratio) if match: matched_name, score, _ = match if score >= threshold: return True, matched_name, int(score) return False, None, 0
Explanation:
Exact string matching is often too rigid because of name variations, typos, or different word orders. To address this, the counterparty names are normalized by uppercasing and removing special characters except for hyphens and ampersands. The rapidfuzz
library then applies fuzzy matching using the token sort ratio, which is insensitive to token order, yielding robust sanction hits when similarity exceeds the threshold (default 85%). This method drastically reduces false negatives in sanction screening.
3. Contextual Anomaly Detection Using Isolation Forest
from datetime import datetime from sklearn.ensemble import IsolationForest def _extract_hour(ts: str) -> int: if not ts: return 0 for fmt in ("%Y-%m-%d %H:%M", "%Y-%m-%d %H:%M:%S", "%H:%M", "%H:%M:%S"): try: return datetime.strptime(ts, fmt).hour except ValueError: continue try: return pd.to_datetime(ts, errors="coerce").hour or 0 except Exception: return 0 def _encode_categories(df: pd.DataFrame, cols): out = df.copy() for c in cols: out[c] = out[c].astype("category").cat.codes return out def detect_anomalies(transactions: pd.DataFrame, contamination: float = 0.05) -> pd.DataFrame: df = transactions.copy() df["hour"] = df["time"].apply(_extract_hour) for col in ["location", "device", "purpose", "txn_type"]: if col not in df.columns: df[col] = "UNKNOWN" enc_df = _encode_categories(df, ["location", "device", "purpose", "txn_type"]) feats = enc_df[["amount", "hour", "location", "device", "purpose", "txn_type"]].astype(float) if feats["amount"].std(ddof=0) > 0: feats["amount"] = (feats["amount"] - feats["amount"].mean()) / feats["amount"].std(ddof=0) model = IsolationForest(contamination=contamination, random_state=42) preds = model.fit_predict(feats) scores = model.decision_function(feats) df["anomaly_flag"] = (preds == -1) df["anomaly_score"] = -scores return df
Explanation:
This part implements anomaly detection using the Isolation Forest algorithm. It first derives the hour from transaction timestamps to capture temporal patterns. Missing categorical features are imputed with "UNKNOWN." All categorical columns (location, device, purpose, transaction type) are encoded as numeric values. The transaction amount is standardized to have mean zero and unit variance, preventing scale issues. Isolation Forest isolates rare transactions that differ significantly from the majority, flagging them as anomalies. The model outputs both a binary anomaly flag and an anomaly score.
4. Generative AI Explanations with Local LLM (Ollama)
pythonfrom langchain_ollama.llms import OllamaLLM from langchain_core.prompts import ChatPromptTemplate def make_llm(model_name: str = "llama3") -> OllamaLLM: return OllamaLLM(model=model_name, temperature=0.2) EXPLAIN_PROMPT = ChatPromptTemplate.from_messages([ ("system", "You are a senior financial crime analyst. Be precise, concise, and actionable. " "Explain why a transaction might be risky using the provided evidence. " "Call out sanction matches, anomaly drivers (amount, time, device, location, txn_type, purpose), " "and any structuring/country risk signals. Provide a recommended next action."), ("human", "Transaction: {txn}\n" "Flags: anomaly_flag={anomaly_flag}, anomaly_score={anomaly_score}, " "sanction_hit={sanction_hit}, matched_name={matched_name}, match_score={match_score}\n\n" "Explain the risk and recommend next steps.") ]) def genai_explain(llm: OllamaLLM, txn_row: dict, sanction_hit: bool, matched_name: str, match_score: int) -> str: msg = EXPLAIN_PROMPT.format( txn=txn_row, anomaly_flag=txn_row.get("anomaly_flag", False), anomaly_score=round(float(txn_row.get("anomaly_score", 0)), 3), sanction_hit=sanction_hit, matched_name=matched_name or "N/A", match_score=match_score ) return llm.invoke(msg)
Explanation:
This component integrates with a local large language model (LLM) through Ollama to generate human-readable explanations. The prompt instructs the AI to act like a senior financial crime analyst, focusing on clarity and actionable insights. For each flagged transaction, it calls out sanction hits, anomaly drivers, and any relevant contextual risk signals. This greatly aids investigators by providing concise risk narratives, improving speed and accuracy of decision-making.
5. Unified Fraud Detection Pipeline
def fraud_pipeline_with_explanations(transactions: pd.DataFrame, sdn_df: pd.DataFrame, llm: OllamaLLM, sanction_threshold: int = 85, contamination: float = 0.05) -> pd.DataFrame: enriched = detect_anomalies(transactions, contamination=contamination) results = [] for _, row in enriched.iterrows(): txn = row.to_dict() hit, matched, score = check_sanctions_fuzzy(txn.get("counterparty", ""), sdn_df, threshold=sanction_threshold) if row["anomaly_flag"] or hit: explanation = genai_explain(llm, txn, hit, matched, score) results.append({ "id": txn.get("id"), "amount": txn.get("amount"), "location": txn.get("location"), "device": txn.get("device"), "time": txn.get("time"), "purpose": txn.get("purpose"), "txn_type": txn.get("txn_type"), "counterparty": txn.get("counterparty"), "anomaly_flag": bool(txn.get("anomaly_flag")), "anomaly_score": float(txn.get("anomaly_score", 0)), "sanction_hit": hit, "sanction_match": matched, "sanction_match_score": score, "genai_explanation": explanation.strip() }) return pd.DataFrame(results)
Explanation:
This function ties all components together. It first performs anomaly detection on the input transactions, then applies fuzzy sanctions checks on the counterparties. Any transaction flagged as anomalous or matching a sanction triggers generation of a detailed AI explanation. The output is a comprehensive DataFrame containing all flagged transactions enriched with anomaly and sanction flags, scores, and descriptive narratives for analyst review.
6. Example Usage with Sample Transactions
if __name__ == "__main__": import pandas as pd transactions = pd.DataFrame([ {"id": "TX1001", "amount": 48000, "location": "Mumbai", "device": "iPhone-15", "time": "2025-08-17 03:15", "purpose": "merchant_payment", "txn_type": "UPI", "counterparty": "LocalMerchant"}, {"id": "TX1002", "amount": 5000, "location": "Delhi", "device": "Samsung-S22","time": "2025-08-17 14:20", "purpose": "groceries", "txn_type": "Card", "counterparty": "GlobalSupplier"}, {"id": "TX1003", "amount": 200000, "location": "London", "device": "Unknown", "time": "2025-08-17 01:45", "purpose": "wire_transfer", "txn_type": "Wire", "counterparty": "BannedEntity"}, {"id": "TX1004", "amount": 1200, "location": "Tehran", "device": "mobile", "time": "2025-08-17 14:00", "purpose": "family_support", "txn_type": "Transfer", "counterparty": "Bank Melli Iran"}, {"id": "TX1005", "amount": 750000, "location": "NYC", "device": "desktop", "time": "2025-08-17 03:00", "purpose": "cash_withdrawal", "txn_type": "ATM", "counterparty": "John Doe"}, {"id": "TX1006", "amount": 30000, "location": "Moscow", "device": "web", "time": "2025-08-17 22:10", "purpose": "transfer", "txn_type": "Wire", "counterparty": "Alfa Bank JSC"} # fuzzy hit vs ALFA-BANK ]) print("Fetching OFAC SDN list...") sdn_df = fetch_ofac_sdn() print("Loading local LLM model...") llm = make_llm(model_name="gemma3:4b") print("Running fraud and sanction screening pipeline...") flagged = fraud_pipeline_with_explanations( transactions=transactions, sdn_df=sdn_df, llm=llm, sanction_threshold=85, contamination=0.06 ) pd.set_option("display.max_colwidth", 200) print("\n=== Flagged Transactions with GenAI Explanations ===") print(flagged)
Real-World Example
The pipeline identifies key suspicious transactions among the samples:
Sanction Hit: TX1004
This transaction involves "Bank Melli Iran," a known sanctioned entity detected via fuzzy matching despite possible name variations. This ensures robust compliance monitoring.
Anomaly Detection: TX1005
A large cash withdrawal of 750,000 in NYC at 3 AM on a desktop device is flagged as anomalous by the Isolation Forest model, highlighting potential fraud or money laundering.
In both cases, the local AI model produces clear, actionable explanations that help human investigators efficiently understand risks and prioritize follow-up action.
Conclusion
This example demonstrates a practical and transparent approach to financial crime detection that leverages open-source tools and local AI, ensuring data privacy without sacrificing analytical power.
✍️ Author’s Note
This blog reflects the author’s personal point of view — shaped by 22+ years of industry experience, along with a deep passion for continuous learning and teaching.
The content has been phrased and structured using Generative AI tools, with the intent to make it engaging, accessible, and insightful for a broader audience.
Comments
Post a Comment