The Compliance Monitor: AI-Powered Transaction Surveillance for German Banking

James Aspinwall — February 2026


This is the flagship agent in the Solaris demo — a real-time transaction monitor that scans payment streams for money laundering patterns and drafts Suspicious Activity Reports. It is also the template for the other four agents: ingest data, detect patterns, generate narrative, present to a human for decision.

What follows is a deep-dive into what it takes to build this agent properly — not as a toy, but as something a BaFin examiner could look at without flinching.


What the Agent Does

A rule engine evaluates a stream of banking transactions against known suspicious typologies. When a pattern matches, the LLM receives the flagged transactions with full context — customer profile, historical behavior, regulatory thresholds — and drafts a Verdachtsmeldung (SAR) narrative. The narrative explains what happened, why it is suspicious, and which regulatory obligation it triggers.

The compliance officer reviews, edits, and files. The agent drafts; the human decides.


The Regulatory Landscape

German AML compliance sits at the intersection of three regulatory layers.

German law — GwG (Geldwaschegesetz):

EU directives — the AML package:

FATF guidance:

Penalties are real. Section 56 GwG allows fines up to EUR 5 million or 10% of annual group turnover for serious breaches. N26 had its growth capped by BaFin in 2021 for inadequate AML controls. Deutsche Bank paid approximately EUR 16 million in 2020. The compliance officer can face personal criminal liability.


Input: The Transaction Stream

The agent ingests transaction data with these fields per record:

Field Purpose
transaction_id Unique reference
amount, currency Value and denomination
sender_id, receiver_id Customer identifiers
sender_country, receiver_country Geographic risk signals
timestamp Temporal pattern detection
account_type Product risk factor
account_dormancy_days Dormant account detection
transaction_type Cash, wire, securities, etc.

For the demo: 10,000 synthetic records with 50-100 seeded suspicious patterns across three categories.


Processing: The Detection Pipeline

The pipeline has four stages.

Stage 1 — Sanctions Screening (< 100ms)

Before any transaction executes, it runs through sanctions checks. Hard matches freeze the transaction immediately. This is a gatekeeper, not the compliance monitor’s primary job, but it must exist.

Stage 2 — Rule Engine (< 5 minutes)

Pattern matching against known typologies:

Structuring (smurfing): Multiple transactions from the same sender, each EUR 9,500–9,900 (just below the EUR 10,000 reporting threshold), within a 48-hour window. The rule aggregates linked transactions and flags when the total exceeds the threshold while individual amounts stay under it.

Rapid cross-border flows: Funds moving through 3+ countries in under 24 hours, amounts above EUR 50,000, with no apparent business rationale matching the customer profile.

Dormant account activation: Accounts inactive for 180+ days suddenly receiving or sending large transfers. Especially suspicious when combined with changes to account signatories or contact information.

Additional typologies to implement:

Stage 3 — ML Risk Scoring

Flagged transactions receive an ML-generated risk score (0-100). The model considers customer profile deviation, counterparty risk, geographic risk factors, and temporal patterns. This score prioritizes the analyst workqueue — highest risk alerts surface first.

The industry false positive rate is 95-99%. A good ML layer can bring this to 85-90%. Even that improvement is transformative: it means analysts spend their time on real risks instead of clearing noise.

Stage 4 — LLM Narrative Generation

For each alert that survives scoring, the LLM receives:

The LLM drafts a Verdachtsmeldung narrative: what happened, why it is suspicious, which GwG section or FATF typology applies, and what the recommended action is.


The Verdachtsmeldung (SAR)

A German SAR filed with the FIU via the goAML platform must contain:

Reporting entity: Bank name, BaFin registration number, MLRO name and contact.

Subject: Full name, DOB, nationality, address, ID document numbers, account numbers/IBANs, tax ID.

Transactions: Date, time, amount, currency, source/destination, counterparty details, SWIFT/BIC codes, transaction type.

Narrative: This is where the LLM adds value. A detailed explanation of why the activity is suspicious, the deviation from normal behavior, the typology classification, the regulatory obligation triggered, and the recommended action. The narrative must reflect the compliance officer’s professional judgment — the LLM drafts, the officer owns.

Filing timeline: Report filed with FIU Germany via goAML. 3-day transaction freeze begins. FIU can extend by 30 days. All records retained for 5 years minimum.


Human-in-the-Loop: Non-Negotiable

BaFin and the EU AI Act are unambiguous: AI assists, humans decide.

Must be human decisions:

Can be automated:

The audit trail captures every step: alert generated (which rule, which data, when), alert assigned (to whom, when), investigation actions (what was reviewed), decision (file/dismiss/escalate, by whom, with what rationale). Five-year retention minimum.


Alerting and Monitoring

Alert severity drives response times:

Level Threshold Response
Critical Sanctions match, large structuring pattern Immediate MLRO notification, transaction freeze
High Multi-typology match, high-risk jurisdiction Same-day review, escalation to senior compliance
Medium Single-rule trigger, moderate risk score 24-48 hour review window
Low Marginal pattern, low risk score Batch review, weekly

Industry benchmarks: large banks generate 5,000-50,000 alerts per month. Each alert takes 30-120 minutes to investigate. L1 analysts handle 10-20 alerts/day. L2 investigators handle 3-8 complex cases/day. The ML scoring layer’s primary value is reducing the volume that reaches human eyes.


Compliance: EU AI Act

AML monitoring systems are classified as high-risk AI under the EU AI Act (Regulation 2024/1689, Annex III). Deadline for compliance: August 2, 2026.

Requirements:

The human oversight requirement aligns directly with the MLRO review mandate. The explainability requirement means black-box models are insufficient — every alert must be traceable to specific rules, features, and data points. SHAP values or similar explainability frameworks are required.


Testing and Validation

Detection Rule Validation

Above-the-line testing: Inject known suspicious patterns into the transaction stream. Verify every seeded typology triggers an alert. Document: pattern tested, expected outcome, actual outcome, pass/fail.

Below-the-line testing: Sample transactions that did NOT trigger alerts. Verify they are genuinely non-suspicious. Review any externally reported cases the system missed.

Back-testing: Run new or modified rules against 12+ months of historical data. Measure impact on alert volume and false positive rates. BaFin expects back-testing before any rule change goes to production.

Model Validation (MaRisk AT 7.2)


Running Under the MCP Orchestrator

The Compliance Monitor maps to WorkingAgents as follows:

MCP Tools:

System Prompt Context: The agent’s system prompt includes: current GwG thresholds, FATF typology definitions, institution-specific risk appetite parameters, and SAR template structure. The LLM operates within these boundaries.

Trigger Conditions:

Output: Timestamped alert with SAR draft, risk score, rule citations, recommended action. Displayed in the unified dashboard timeline alongside the other four agents.


Demo Flow

The money shot: a live transaction stream scrolling by. Normal payments — salary deposits, utility bills, retail purchases. Then the pattern emerges. Eight transfers from the same sender, each EUR 9,750, within 36 hours. The rule engine flags. The ML model scores it at 87/100. The LLM drafts:

“Structuring pattern detected. Sender DE-4472 executed 8 transactions totaling EUR 78,000 between 2026-02-20 14:30 and 2026-02-22 02:15, each below the EUR 10,000 reporting threshold (GwG Section 10(3)). Average amount: EUR 9,750. Historical average for this customer: EUR 2,100/month. This represents a 37x deviation from baseline. Pattern consistent with FATF structuring typology. Recommended action: escalate to L2 investigation, consider Verdachtsmeldung filing under GwG Section 43.”

The compliance officer reviews, edits one sentence, hits file. The dashboard shows the SAR queued for goAML submission. The freeze timer starts.

That is what AI compliance monitoring looks like when it works.


Beyond Drafting: Execute the Recommendation

Currently, the agent drafts the SAR and presents it for MLRO review. The next step: a one-click “Execute” button that pre-populates the goAML submission form, stages the Verdachtsmeldung for final sign-off, and triggers the 3-day transaction freeze — all with a single approval.

This transforms the Compliance Monitor from “helpful assistant that drafts reports” to “autonomous compliance operations” — with guardrails. Approval workflows ensure the MLRO still signs off. Rollback mechanisms allow freeze cancellation. The audit trail captures not just the recommendation but the execution.

The consulting differentiator: Generic AI can write text. This agent speaks GwG. It knows the goAML field structure, the FIU submission format, the freeze timeline under §44 GwG, and the tipping-off prohibition under §47. That regulatory integration — not the LLM itself — is what justifies premium pricing and makes the service impossible to replicate with ChatGPT and a prompt.