22 GenAI in Banking & Finance: Post 2 100% On-Premise, 100% Open-Source Sanctions Screening and Fraud Detection Pipeline with AI-Powered Explanations

100% On-Premise, 100% Open-Source Sanctions Screening and Fraud Detection Pipeline with AI-Powered Explanations

In the first post of this series, we explored how Generative AI (GenAI) is redefining fraud detection — shifting from static, rule-based filters to dynamic, context-aware intelligence. Today, let’s go deeper with a hands-on example: building a fraud detection & Sanction Screening pipeline that not only detects issues, but also explains them using GenAI.

We’ll extend the classic anomaly detection setup by incorporating real-world financial risk checks:

Geographical risks (location mismatches)
Transaction types (unusual behavior like structuring/cash layering)
Product types (cash-intensive or high-risk financial instruments)
Counterparty risks (sanctions/PEP)

Most importantly, we’ll connect this to a live sanctions list — the OFAC Specially Designated Nationals (SDN) List published by the U.S. Treasury — to demonstrate how banks can operationalize compliance checks in real time.

Why This Matters

In today’s complex financial landscape, organizations face significant challenges detecting suspicious transactions that may indicate fraud or involvement with sanctioned entities. Traditional rule-based checks often fall short when dealing with ambiguous or evolving risk patterns. This blog post presents a unified pipeline that combines fuzzy sanctions screening, contextual anomaly detection, and generative AI explanations for enhanced financial crime investigation.

Problem Statement

Financial institutions must comply with sanctions issued by regulatory bodies like the U.S. Treasury's Office of Foreign Assets Control (OFAC). These sanctions lists (such as the Specially Designated Nationals, or SDN, list) include names of individuals and organizations whose transactions are prohibited. However, exact name matching fails often due to variations, aliases, and misspellings, so fuzzy matching algorithms are essential.

Additionally, detecting anomalous behavior within transactions requires analyzing contextual data like amount, timestamp, location, device, and transaction purpose. Distinguishing truly suspicious transactions in this high-volume environment is challenging.

Finally, manual review of flagged transactions is time-intensive. Human analysts benefit greatly from AI-generated contextual explanations, which clarify why transactions are flagged and suggest actionable next steps.

A Unified AI-Powered Pipeline

1. Fetching and Preparing the OFAC SDN Sanctions List


import io
import requests
import pandas as pd

OFAC_SDN_URL = "https://www.treasury.gov/ofac/downloads/sdn.csv"

def fetch_ofac_sdn(url: str = OFAC_SDN_URL) -> pd.DataFrame:
    resp = requests.get(url, timeout=30)
    resp.raise_for_status()
    data = resp.content.decode("utf-8", errors="ignore")
    df = pd.read_csv(io.StringIO(data), header=None)
    df["Name"] = df[1].astype(str).str.upper().str.strip()
    sanction_list = set(df["Name"].unique())
    sanction_df = pd.DataFrame(sanction_list, columns=['Name'])
    return sanction_df

Explanation:
This function dynamically fetches the latest Specially Designated Nationals (SDN) list from the US Treasury website. It reads the CSV response, converts all entity names to uppercase for case-insensitive matching, strips whitespace, and removes duplicate entries. The resulting DataFrame provides a clean, standardized sanction list to be used downstream.

2. Fuzzy Sanctions Screening with Name Normalization


import re
from rapidfuzz import fuzz, process

def normalize_party_name(s: str) -> str:
    s = (s or "").upper()
    s = re.sub(r"[^\w\s\-&]", " ", s)
    s = re.sub(r"\s+", " ", s).strip()
    return s

def check_sanctions_fuzzy(counterparty: str, sdn_df: pd.DataFrame, threshold: int = 85):
    if not counterparty:
        return False, None, 0
    query = normalize_party_name(counterparty)
    choices = sdn_df["Name"].tolist()
    match = process.extractOne(query, choices, scorer=fuzz.token_sort_ratio)
    if match:
        matched_name, score, _ = match
        if score >= threshold:
            return True, matched_name, int(score)
    return False, None, 0

Explanation:
Exact string matching is often too rigid because of name variations, typos, or different word orders. To address this, the counterparty names are normalized by uppercasing and removing special characters except for hyphens and ampersands. The rapidfuzz library then applies fuzzy matching using the token sort ratio, which is insensitive to token order, yielding robust sanction hits when similarity exceeds the threshold (default 85%). This method drastically reduces false negatives in sanction screening.

3. Contextual Anomaly Detection Using Isolation Forest



from datetime import datetime
from sklearn.ensemble import IsolationForest

def _extract_hour(ts: str) -> int:
    if not ts:
        return 0
    for fmt in ("%Y-%m-%d %H:%M", "%Y-%m-%d %H:%M:%S", "%H:%M", "%H:%M:%S"):
        try:
            return datetime.strptime(ts, fmt).hour
        except ValueError:
            continue
    try:
        return pd.to_datetime(ts, errors="coerce").hour or 0
    except Exception:
        return 0

def _encode_categories(df: pd.DataFrame, cols):
    out = df.copy()
    for c in cols:
        out[c] = out[c].astype("category").cat.codes
    return out

def detect_anomalies(transactions: pd.DataFrame, contamination: float = 0.05) -> pd.DataFrame:
    df = transactions.copy()
    df["hour"] = df["time"].apply(_extract_hour)
    for col in ["location", "device", "purpose", "txn_type"]:
        if col not in df.columns:
            df[col] = "UNKNOWN"
    enc_df = _encode_categories(df, ["location", "device", "purpose", "txn_type"])
    feats = enc_df[["amount", "hour", "location", "device", "purpose", "txn_type"]].astype(float)
    if feats["amount"].std(ddof=0) > 0:
        feats["amount"] = (feats["amount"] - feats["amount"].mean()) / feats["amount"].std(ddof=0)
    model = IsolationForest(contamination=contamination, random_state=42)
    preds = model.fit_predict(feats)
    scores = model.decision_function(feats)
    df["anomaly_flag"] = (preds == -1)
    df["anomaly_score"] = -scores
    return df

Explanation:
This part implements anomaly detection using the Isolation Forest algorithm. It first derives the hour from transaction timestamps to capture temporal patterns. Missing categorical features are imputed with "UNKNOWN." All categorical columns (location, device, purpose, transaction type) are encoded as numeric values. The transaction amount is standardized to have mean zero and unit variance, preventing scale issues. Isolation Forest isolates rare transactions that differ significantly from the majority, flagging them as anomalies. The model outputs both a binary anomaly flag and an anomaly score.

4. Generative AI Explanations with Local LLM (Ollama)


python
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate

def make_llm(model_name: str = "llama3") -> OllamaLLM:
    return OllamaLLM(model=model_name, temperature=0.2)

EXPLAIN_PROMPT = ChatPromptTemplate.from_messages([
    ("system",
     "You are a senior financial crime analyst. Be precise, concise, and actionable. "
     "Explain why a transaction might be risky using the provided evidence. "
     "Call out sanction matches, anomaly drivers (amount, time, device, location, txn_type, purpose), "
     "and any structuring/country risk signals. Provide a recommended next action."),
    ("human",
     "Transaction: {txn}\n"
     "Flags: anomaly_flag={anomaly_flag}, anomaly_score={anomaly_score}, "
     "sanction_hit={sanction_hit}, matched_name={matched_name}, match_score={match_score}\n\n"
     "Explain the risk and recommend next steps.")
])

def genai_explain(llm: OllamaLLM, txn_row: dict, sanction_hit: bool, matched_name: str, match_score: int) -> str:
    msg = EXPLAIN_PROMPT.format(
        txn=txn_row,
        anomaly_flag=txn_row.get("anomaly_flag", False),
        anomaly_score=round(float(txn_row.get("anomaly_score", 0)), 3),
        sanction_hit=sanction_hit,
        matched_name=matched_name or "N/A",
        match_score=match_score
    )
    return llm.invoke(msg)

Explanation:
This component integrates with a local large language model (LLM) through Ollama to generate human-readable explanations. The prompt instructs the AI to act like a senior financial crime analyst, focusing on clarity and actionable insights. For each flagged transaction, it calls out sanction hits, anomaly drivers, and any relevant contextual risk signals. This greatly aids investigators by providing concise risk narratives, improving speed and accuracy of decision-making.

5. Unified Fraud Detection Pipeline



def fraud_pipeline_with_explanations(transactions: pd.DataFrame,
                                     sdn_df: pd.DataFrame,
                                     llm: OllamaLLM,
                                     sanction_threshold: int = 85,
                                     contamination: float = 0.05) -> pd.DataFrame:
    enriched = detect_anomalies(transactions, contamination=contamination)
    results = []
    for _, row in enriched.iterrows():
        txn = row.to_dict()
        hit, matched, score = check_sanctions_fuzzy(txn.get("counterparty", ""), sdn_df, threshold=sanction_threshold)
        if row["anomaly_flag"] or hit:
            explanation = genai_explain(llm, txn, hit, matched, score)
            results.append({
                "id": txn.get("id"),
                "amount": txn.get("amount"),
                "location": txn.get("location"),
                "device": txn.get("device"),
                "time": txn.get("time"),
                "purpose": txn.get("purpose"),
                "txn_type": txn.get("txn_type"),
                "counterparty": txn.get("counterparty"),
                "anomaly_flag": bool(txn.get("anomaly_flag")),
                "anomaly_score": float(txn.get("anomaly_score", 0)),
                "sanction_hit": hit,
                "sanction_match": matched,
                "sanction_match_score": score,
                "genai_explanation": explanation.strip()
            })
    return pd.DataFrame(results)

Explanation:
This function ties all components together. It first performs anomaly detection on the input transactions, then applies fuzzy sanctions checks on the counterparties. Any transaction flagged as anomalous or matching a sanction triggers generation of a detailed AI explanation. The output is a comprehensive DataFrame containing all flagged transactions enriched with anomaly and sanction flags, scores, and descriptive narratives for analyst review.

6. Example Usage with Sample Transactions



if __name__ == "__main__":
    import pandas as pd

    transactions = pd.DataFrame([
        {"id": "TX1001", "amount": 48000,  "location": "Mumbai", "device": "iPhone-15",  "time": "2025-08-17 03:15",
         "purpose": "merchant_payment", "txn_type": "UPI",          "counterparty": "LocalMerchant"},
        {"id": "TX1002", "amount": 5000,   "location": "Delhi",  "device": "Samsung-S22","time": "2025-08-17 14:20",
         "purpose": "groceries",          "txn_type": "Card",        "counterparty": "GlobalSupplier"},
        {"id": "TX1003", "amount": 200000, "location": "London", "device": "Unknown",    "time": "2025-08-17 01:45",
         "purpose": "wire_transfer",      "txn_type": "Wire",        "counterparty": "BannedEntity"},
        {"id": "TX1004", "amount": 1200,   "location": "Tehran", "device": "mobile",     "time": "2025-08-17 14:00",
         "purpose": "family_support",     "txn_type": "Transfer",    "counterparty": "Bank Melli Iran"},
        {"id": "TX1005", "amount": 750000, "location": "NYC",    "device": "desktop",    "time": "2025-08-17 03:00",
         "purpose": "cash_withdrawal",    "txn_type": "ATM",         "counterparty": "John Doe"},
        {"id": "TX1006", "amount": 30000,  "location": "Moscow", "device": "web",        "time": "2025-08-17 22:10",
         "purpose": "transfer",           "txn_type": "Wire",        "counterparty": "Alfa Bank JSC"} # fuzzy hit vs ALFA-BANK
    ])

    print("Fetching OFAC SDN list...")
    sdn_df = fetch_ofac_sdn()

    print("Loading local LLM model...")
    llm = make_llm(model_name="gemma3:4b")

    print("Running fraud and sanction screening pipeline...")
    flagged = fraud_pipeline_with_explanations(
        transactions=transactions,
        sdn_df=sdn_df,
        llm=llm,
        sanction_threshold=85,
        contamination=0.06
    )

    pd.set_option("display.max_colwidth", 200)
    print("\n=== Flagged Transactions with GenAI Explanations ===")
    print(flagged)

Real-World Example

The pipeline identifies key suspicious transactions among the samples:

Sanction Hit: TX1004

This transaction involves "Bank Melli Iran," a known sanctioned entity detected via fuzzy matching despite possible name variations. This ensures robust compliance monitoring.

Anomaly Detection: TX1005

A large cash withdrawal of 750,000 in NYC at 3 AM on a desktop device is flagged as anomalous by the Isolation Forest model, highlighting potential fraud or money laundering.

In both cases, the local AI model produces clear, actionable explanations that help human investigators efficiently understand risks and prioritize follow-up action.

Conclusion

This example demonstrates a practical and transparent approach to financial crime detection that leverages open-source tools and local AI, ensuring data privacy without sacrificing analytical power.

✍️ Author’s Note

This blog reflects the author’s personal point of view — shaped by 22+ years of industry experience, along with a deep passion for continuous learning and teaching.
The content has been phrased and structured using Generative AI tools, with the intent to make it engaging, accessible, and insightful for a broader audience.

Search This Blog

Tech to Transform