The Compliance Monitor: AI-Powered Transaction Surveillance for German Banking

James Aspinwall — February 2026

This is the flagship agent in the Solaris demo — a real-time transaction monitor that scans payment streams for money laundering patterns and drafts Suspicious Activity Reports. It is also the template for the other four agents: ingest data, detect patterns, generate narrative, present to a human for decision.

What follows is a deep-dive into what it takes to build this agent properly — not as a toy, but as something a BaFin examiner could look at without flinching.

What the Agent Does

A rule engine evaluates a stream of banking transactions against known suspicious typologies. When a pattern matches, the LLM receives the flagged transactions with full context — customer profile, historical behavior, regulatory thresholds — and drafts a Verdachtsmeldung (SAR) narrative. The narrative explains what happened, why it is suspicious, and which regulatory obligation it triggers.

The compliance officer reviews, edits, and files. The agent drafts; the human decides.

The Regulatory Landscape

German AML compliance sits at the intersection of three regulatory layers.

German law — GwG (Geldwaschegesetz):

Section 25h KWG mandates automated transaction monitoring systems for credit institutions
Section 43 GwG requires filing a Verdachtsmeldung “without delay” upon suspicion forming — practically within 24-72 hours
Section 6 GwG requires appointment of a Geldwaschebeauftragter (MLRO) with personal liability for filing decisions
Section 44 GwG imposes a 3-business-day transaction freeze after filing, extendable by the FIU for 30 additional days
Section 47 GwG criminalizes tipping off — informing the subject that a report has been filed

EU directives — the AML package:

6AMLD (Directive 2018/1673) defines 22 predicate offenses including tax crimes and cybercrime, with minimum 4-year imprisonment
The 2024 EU AML Package introduces the AMLR (Regulation 2024/1624) — a directly applicable single rulebook, no transposition needed
AMLA (Regulation 2024/1620) establishes the new EU AML Authority in Frankfurt, with direct supervision of highest-risk entities starting 2028
Cash payment limit of EUR 10,000 harmonized across the EU

FATF guidance:

FATF Recommendations 10 (CDD), 20 (STR reporting), and 29 (FIUs) set the global standard
FATF typologies define the three money laundering stages: placement, layering, integration

Penalties are real. Section 56 GwG allows fines up to EUR 5 million or 10% of annual group turnover for serious breaches. N26 had its growth capped by BaFin in 2021 for inadequate AML controls. Deutsche Bank paid approximately EUR 16 million in 2020. The compliance officer can face personal criminal liability.

Input: The Transaction Stream

The agent ingests transaction data with these fields per record:

Field	Purpose
`transaction_id`	Unique reference
`amount`, `currency`	Value and denomination
`sender_id`, `receiver_id`	Customer identifiers
`sender_country`, `receiver_country`	Geographic risk signals
`timestamp`	Temporal pattern detection
`account_type`	Product risk factor
`account_dormancy_days`	Dormant account detection
`transaction_type`	Cash, wire, securities, etc.

For the demo: 10,000 synthetic records with 50-100 seeded suspicious patterns across three categories.

Processing: The Detection Pipeline

The pipeline has four stages.

Stage 1 — Sanctions Screening (< 100ms)

Before any transaction executes, it runs through sanctions checks. Hard matches freeze the transaction immediately. This is a gatekeeper, not the compliance monitor’s primary job, but it must exist.

Stage 2 — Rule Engine (< 5 minutes)

Pattern matching against known typologies:

Structuring (smurfing): Multiple transactions from the same sender, each EUR 9,500–9,900 (just below the EUR 10,000 reporting threshold), within a 48-hour window. The rule aggregates linked transactions and flags when the total exceeds the threshold while individual amounts stay under it.

Rapid cross-border flows: Funds moving through 3+ countries in under 24 hours, amounts above EUR 50,000, with no apparent business rationale matching the customer profile.

Dormant account activation: Accounts inactive for 180+ days suddenly receiving or sending large transfers. Especially suspicious when combined with changes to account signatories or contact information.

Additional typologies to implement:

Velocity anomaly: transaction frequency 3x+ historical average
Funnel accounts: many small incoming transfers, single large outgoing
Round-tripping: funds leave and return through different jurisdictions
Trade-based laundering: significantly over/under-priced goods in trade finance (price deviation > 30%)

Stage 3 — ML Risk Scoring

Flagged transactions receive an ML-generated risk score (0-100). The model considers customer profile deviation, counterparty risk, geographic risk factors, and temporal patterns. This score prioritizes the analyst workqueue — highest risk alerts surface first.

The industry false positive rate is 95-99%. A good ML layer can bring this to 85-90%. Even that improvement is transformative: it means analysts spend their time on real risks instead of clearing noise.

Stage 4 — LLM Narrative Generation

For each alert that survives scoring, the LLM receives:

The flagged transactions with full context
The customer’s risk profile and transaction history
The rule or pattern that triggered the alert
The regulatory obligation that applies

The LLM drafts a Verdachtsmeldung narrative: what happened, why it is suspicious, which GwG section or FATF typology applies, and what the recommended action is.

The Verdachtsmeldung (SAR)

A German SAR filed with the FIU via the goAML platform must contain:

Reporting entity: Bank name, BaFin registration number, MLRO name and contact.

Subject: Full name, DOB, nationality, address, ID document numbers, account numbers/IBANs, tax ID.

Transactions: Date, time, amount, currency, source/destination, counterparty details, SWIFT/BIC codes, transaction type.

Narrative: This is where the LLM adds value. A detailed explanation of why the activity is suspicious, the deviation from normal behavior, the typology classification, the regulatory obligation triggered, and the recommended action. The narrative must reflect the compliance officer’s professional judgment — the LLM drafts, the officer owns.

Filing timeline: Report filed with FIU Germany via goAML. 3-day transaction freeze begins. FIU can extend by 30 days. All records retained for 5 years minimum.

Human-in-the-Loop: Non-Negotiable

BaFin and the EU AI Act are unambiguous: AI assists, humans decide.

Must be human decisions:

Filing a Verdachtsmeldung — the MLRO carries personal legal liability
Dismissing an alert — every dismissal must have documented rationale
Customer off-boarding for AML reasons
EDD escalation decisions

Can be automated:

Alert generation and scoring
Data enrichment and context gathering
SAR narrative drafting (reviewed and approved by MLRO)
Auto-closure of clearly false positives — but only with documented criteria, regular validation, and 5-10% quality sampling

The audit trail captures every step: alert generated (which rule, which data, when), alert assigned (to whom, when), investigation actions (what was reviewed), decision (file/dismiss/escalate, by whom, with what rationale). Five-year retention minimum.

Alerting and Monitoring

Alert severity drives response times:

Level	Threshold	Response
Critical	Sanctions match, large structuring pattern	Immediate MLRO notification, transaction freeze
High	Multi-typology match, high-risk jurisdiction	Same-day review, escalation to senior compliance
Medium	Single-rule trigger, moderate risk score	24-48 hour review window
Low	Marginal pattern, low risk score	Batch review, weekly

Industry benchmarks: large banks generate 5,000-50,000 alerts per month. Each alert takes 30-120 minutes to investigate. L1 analysts handle 10-20 alerts/day. L2 investigators handle 3-8 complex cases/day. The ML scoring layer’s primary value is reducing the volume that reaches human eyes.

Compliance: EU AI Act

AML monitoring systems are classified as high-risk AI under the EU AI Act (Regulation 2024/1689, Annex III). Deadline for compliance: August 2, 2026.

Requirements:

Risk management system (Article 9): continuous risk assessment throughout the AI lifecycle
Data governance (Article 10): representative training data, bias examination
Technical documentation (Article 11): comprehensive system documentation before deployment
Record-keeping (Article 12): automatic event logging, 5+ year retention
Transparency (Article 13): clear instructions for deployers, performance characteristics, known limitations
Human oversight (Article 14): ability to understand, interpret, override, and interrupt the AI system
Accuracy and robustness (Article 15): defined accuracy metrics, resilience against adversarial attacks

The human oversight requirement aligns directly with the MLRO review mandate. The explainability requirement means black-box models are insufficient — every alert must be traceable to specific rules, features, and data points. SHAP values or similar explainability frameworks are required.

Testing and Validation

Detection Rule Validation

Above-the-line testing: Inject known suspicious patterns into the transaction stream. Verify every seeded typology triggers an alert. Document: pattern tested, expected outcome, actual outcome, pass/fail.

Below-the-line testing: Sample transactions that did NOT trigger alerts. Verify they are genuinely non-suspicious. Review any externally reported cases the system missed.

Back-testing: Run new or modified rules against 12+ months of historical data. Measure impact on alert volume and false positive rates. BaFin expects back-testing before any rule change goes to production.

Model Validation (MaRisk AT 7.2)

Independent validation team (separate from developers)
Annual validation minimum, triggered on material changes
Champion-challenger framework: 3-6 month parallel running for new models
Adversarial testing: can the model be evaded by gradually shifting behavior?
Bias testing: geographic risk factors permissible, but nationality/ethnicity cannot be sole drivers

Running Under the MCP Orchestrator

The Compliance Monitor maps to WorkingAgents as follows:

MCP Tools:

compliance_scan_transactions — feeds a batch through the rule engine, returns flagged items
compliance_score_alert — runs ML scoring on a flagged transaction set
compliance_draft_sar — generates Verdachtsmeldung narrative from alert context
compliance_alert_status — retrieves current alert queue with severity and SLA status

System Prompt Context: The agent’s system prompt includes: current GwG thresholds, FATF typology definitions, institution-specific risk appetite parameters, and SAR template structure. The LLM operates within these boundaries.

Trigger Conditions:

Continuous: transaction stream ingestion via event feed
Periodic: batch analysis of daily transaction aggregates
On-demand: ad hoc investigation of specific customers or patterns

Output: Timestamped alert with SAR draft, risk score, rule citations, recommended action. Displayed in the unified dashboard timeline alongside the other four agents.

Demo Flow

The money shot: a live transaction stream scrolling by. Normal payments — salary deposits, utility bills, retail purchases. Then the pattern emerges. Eight transfers from the same sender, each EUR 9,750, within 36 hours. The rule engine flags. The ML model scores it at 87/100. The LLM drafts:

“Structuring pattern detected. Sender DE-4472 executed 8 transactions totaling EUR 78,000 between 2026-02-20 14:30 and 2026-02-22 02:15, each below the EUR 10,000 reporting threshold (GwG Section 10(3)). Average amount: EUR 9,750. Historical average for this customer: EUR 2,100/month. This represents a 37x deviation from baseline. Pattern consistent with FATF structuring typology. Recommended action: escalate to L2 investigation, consider Verdachtsmeldung filing under GwG Section 43.”

The compliance officer reviews, edits one sentence, hits file. The dashboard shows the SAR queued for goAML submission. The freeze timer starts.

That is what AI compliance monitoring looks like when it works.

Beyond Drafting: Execute the Recommendation

Currently, the agent drafts the SAR and presents it for MLRO review. The next step: a one-click “Execute” button that pre-populates the goAML submission form, stages the Verdachtsmeldung for final sign-off, and triggers the 3-day transaction freeze — all with a single approval.

This transforms the Compliance Monitor from “helpful assistant that drafts reports” to “autonomous compliance operations” — with guardrails. Approval workflows ensure the MLRO still signs off. Rollback mechanisms allow freeze cancellation. The audit trail captures not just the recommendation but the execution.

The consulting differentiator: Generic AI can write text. This agent speaks GwG. It knows the goAML field structure, the FIU submission format, the freeze timeline under §44 GwG, and the tipping-off prohibition under §47. That regulatory integration — not the LLM itself — is what justifies premium pricing and makes the service impossible to replicate with ChatGPT and a prompt.