James Aspinwall — February 2026
This is the flagship agent in the Solaris demo — a real-time transaction monitor that scans payment streams for money laundering patterns and drafts Suspicious Activity Reports. It is also the template for the other four agents: ingest data, detect patterns, generate narrative, present to a human for decision.
What follows is a deep-dive into what it takes to build this agent properly — not as a toy, but as something a BaFin examiner could look at without flinching.
What the Agent Does
A rule engine evaluates a stream of banking transactions against known suspicious typologies. When a pattern matches, the LLM receives the flagged transactions with full context — customer profile, historical behavior, regulatory thresholds — and drafts a Verdachtsmeldung (SAR) narrative. The narrative explains what happened, why it is suspicious, and which regulatory obligation it triggers.
The compliance officer reviews, edits, and files. The agent drafts; the human decides.
The Regulatory Landscape
German AML compliance sits at the intersection of three regulatory layers.
German law — GwG (Geldwaschegesetz):
- Section 25h KWG mandates automated transaction monitoring systems for credit institutions
- Section 43 GwG requires filing a Verdachtsmeldung “without delay” upon suspicion forming — practically within 24-72 hours
- Section 6 GwG requires appointment of a Geldwaschebeauftragter (MLRO) with personal liability for filing decisions
- Section 44 GwG imposes a 3-business-day transaction freeze after filing, extendable by the FIU for 30 additional days
- Section 47 GwG criminalizes tipping off — informing the subject that a report has been filed
EU directives — the AML package:
- 6AMLD (Directive 2018/1673) defines 22 predicate offenses including tax crimes and cybercrime, with minimum 4-year imprisonment
- The 2024 EU AML Package introduces the AMLR (Regulation 2024/1624) — a directly applicable single rulebook, no transposition needed
- AMLA (Regulation 2024/1620) establishes the new EU AML Authority in Frankfurt, with direct supervision of highest-risk entities starting 2028
- Cash payment limit of EUR 10,000 harmonized across the EU
FATF guidance:
- FATF Recommendations 10 (CDD), 20 (STR reporting), and 29 (FIUs) set the global standard
- FATF typologies define the three money laundering stages: placement, layering, integration
Penalties are real. Section 56 GwG allows fines up to EUR 5 million or 10% of annual group turnover for serious breaches. N26 had its growth capped by BaFin in 2021 for inadequate AML controls. Deutsche Bank paid approximately EUR 16 million in 2020. The compliance officer can face personal criminal liability.
Input: The Transaction Stream
The agent ingests transaction data with these fields per record:
| Field | Purpose |
|---|---|
transaction_id |
Unique reference |
amount, currency |
Value and denomination |
sender_id, receiver_id |
Customer identifiers |
sender_country, receiver_country |
Geographic risk signals |
timestamp |
Temporal pattern detection |
account_type |
Product risk factor |
account_dormancy_days |
Dormant account detection |
transaction_type |
Cash, wire, securities, etc. |
For the demo: 10,000 synthetic records with 50-100 seeded suspicious patterns across three categories.
Processing: The Detection Pipeline
The pipeline has four stages.
Stage 1 — Sanctions Screening (< 100ms)
Before any transaction executes, it runs through sanctions checks. Hard matches freeze the transaction immediately. This is a gatekeeper, not the compliance monitor’s primary job, but it must exist.
Stage 2 — Rule Engine (< 5 minutes)
Pattern matching against known typologies:
Structuring (smurfing): Multiple transactions from the same sender, each EUR 9,500–9,900 (just below the EUR 10,000 reporting threshold), within a 48-hour window. The rule aggregates linked transactions and flags when the total exceeds the threshold while individual amounts stay under it.
Rapid cross-border flows: Funds moving through 3+ countries in under 24 hours, amounts above EUR 50,000, with no apparent business rationale matching the customer profile.
Dormant account activation: Accounts inactive for 180+ days suddenly receiving or sending large transfers. Especially suspicious when combined with changes to account signatories or contact information.
Additional typologies to implement:
- Velocity anomaly: transaction frequency 3x+ historical average
- Funnel accounts: many small incoming transfers, single large outgoing
- Round-tripping: funds leave and return through different jurisdictions
- Trade-based laundering: significantly over/under-priced goods in trade finance (price deviation > 30%)
Stage 3 — ML Risk Scoring
Flagged transactions receive an ML-generated risk score (0-100). The model considers customer profile deviation, counterparty risk, geographic risk factors, and temporal patterns. This score prioritizes the analyst workqueue — highest risk alerts surface first.
The industry false positive rate is 95-99%. A good ML layer can bring this to 85-90%. Even that improvement is transformative: it means analysts spend their time on real risks instead of clearing noise.
Stage 4 — LLM Narrative Generation
For each alert that survives scoring, the LLM receives:
- The flagged transactions with full context
- The customer’s risk profile and transaction history
- The rule or pattern that triggered the alert
- The regulatory obligation that applies
The LLM drafts a Verdachtsmeldung narrative: what happened, why it is suspicious, which GwG section or FATF typology applies, and what the recommended action is.
The Verdachtsmeldung (SAR)
A German SAR filed with the FIU via the goAML platform must contain:
Reporting entity: Bank name, BaFin registration number, MLRO name and contact.
Subject: Full name, DOB, nationality, address, ID document numbers, account numbers/IBANs, tax ID.
Transactions: Date, time, amount, currency, source/destination, counterparty details, SWIFT/BIC codes, transaction type.
Narrative: This is where the LLM adds value. A detailed explanation of why the activity is suspicious, the deviation from normal behavior, the typology classification, the regulatory obligation triggered, and the recommended action. The narrative must reflect the compliance officer’s professional judgment — the LLM drafts, the officer owns.
Filing timeline: Report filed with FIU Germany via goAML. 3-day transaction freeze begins. FIU can extend by 30 days. All records retained for 5 years minimum.
Human-in-the-Loop: Non-Negotiable
BaFin and the EU AI Act are unambiguous: AI assists, humans decide.
Must be human decisions:
- Filing a Verdachtsmeldung — the MLRO carries personal legal liability
- Dismissing an alert — every dismissal must have documented rationale
- Customer off-boarding for AML reasons
- EDD escalation decisions
Can be automated:
- Alert generation and scoring
- Data enrichment and context gathering
- SAR narrative drafting (reviewed and approved by MLRO)
- Auto-closure of clearly false positives — but only with documented criteria, regular validation, and 5-10% quality sampling
The audit trail captures every step: alert generated (which rule, which data, when), alert assigned (to whom, when), investigation actions (what was reviewed), decision (file/dismiss/escalate, by whom, with what rationale). Five-year retention minimum.
Alerting and Monitoring
Alert severity drives response times:
| Level | Threshold | Response |
|---|---|---|
| Critical | Sanctions match, large structuring pattern | Immediate MLRO notification, transaction freeze |
| High | Multi-typology match, high-risk jurisdiction | Same-day review, escalation to senior compliance |
| Medium | Single-rule trigger, moderate risk score | 24-48 hour review window |
| Low | Marginal pattern, low risk score | Batch review, weekly |
Industry benchmarks: large banks generate 5,000-50,000 alerts per month. Each alert takes 30-120 minutes to investigate. L1 analysts handle 10-20 alerts/day. L2 investigators handle 3-8 complex cases/day. The ML scoring layer’s primary value is reducing the volume that reaches human eyes.
Compliance: EU AI Act
AML monitoring systems are classified as high-risk AI under the EU AI Act (Regulation 2024/1689, Annex III). Deadline for compliance: August 2, 2026.
Requirements:
- Risk management system (Article 9): continuous risk assessment throughout the AI lifecycle
- Data governance (Article 10): representative training data, bias examination
- Technical documentation (Article 11): comprehensive system documentation before deployment
- Record-keeping (Article 12): automatic event logging, 5+ year retention
- Transparency (Article 13): clear instructions for deployers, performance characteristics, known limitations
- Human oversight (Article 14): ability to understand, interpret, override, and interrupt the AI system
- Accuracy and robustness (Article 15): defined accuracy metrics, resilience against adversarial attacks
The human oversight requirement aligns directly with the MLRO review mandate. The explainability requirement means black-box models are insufficient — every alert must be traceable to specific rules, features, and data points. SHAP values or similar explainability frameworks are required.
Testing and Validation
Detection Rule Validation
Above-the-line testing: Inject known suspicious patterns into the transaction stream. Verify every seeded typology triggers an alert. Document: pattern tested, expected outcome, actual outcome, pass/fail.
Below-the-line testing: Sample transactions that did NOT trigger alerts. Verify they are genuinely non-suspicious. Review any externally reported cases the system missed.
Back-testing: Run new or modified rules against 12+ months of historical data. Measure impact on alert volume and false positive rates. BaFin expects back-testing before any rule change goes to production.
Model Validation (MaRisk AT 7.2)
- Independent validation team (separate from developers)
- Annual validation minimum, triggered on material changes
- Champion-challenger framework: 3-6 month parallel running for new models
- Adversarial testing: can the model be evaded by gradually shifting behavior?
- Bias testing: geographic risk factors permissible, but nationality/ethnicity cannot be sole drivers
Running Under the MCP Orchestrator
The Compliance Monitor maps to WorkingAgents as follows:
MCP Tools:
-
compliance_scan_transactions— feeds a batch through the rule engine, returns flagged items -
compliance_score_alert— runs ML scoring on a flagged transaction set -
compliance_draft_sar— generates Verdachtsmeldung narrative from alert context -
compliance_alert_status— retrieves current alert queue with severity and SLA status
System Prompt Context: The agent’s system prompt includes: current GwG thresholds, FATF typology definitions, institution-specific risk appetite parameters, and SAR template structure. The LLM operates within these boundaries.
Trigger Conditions:
- Continuous: transaction stream ingestion via event feed
- Periodic: batch analysis of daily transaction aggregates
- On-demand: ad hoc investigation of specific customers or patterns
Output: Timestamped alert with SAR draft, risk score, rule citations, recommended action. Displayed in the unified dashboard timeline alongside the other four agents.
Demo Flow
The money shot: a live transaction stream scrolling by. Normal payments — salary deposits, utility bills, retail purchases. Then the pattern emerges. Eight transfers from the same sender, each EUR 9,750, within 36 hours. The rule engine flags. The ML model scores it at 87/100. The LLM drafts:
“Structuring pattern detected. Sender DE-4472 executed 8 transactions totaling EUR 78,000 between 2026-02-20 14:30 and 2026-02-22 02:15, each below the EUR 10,000 reporting threshold (GwG Section 10(3)). Average amount: EUR 9,750. Historical average for this customer: EUR 2,100/month. This represents a 37x deviation from baseline. Pattern consistent with FATF structuring typology. Recommended action: escalate to L2 investigation, consider Verdachtsmeldung filing under GwG Section 43.”
The compliance officer reviews, edits one sentence, hits file. The dashboard shows the SAR queued for goAML submission. The freeze timer starts.
That is what AI compliance monitoring looks like when it works.
Beyond Drafting: Execute the Recommendation
Currently, the agent drafts the SAR and presents it for MLRO review. The next step: a one-click “Execute” button that pre-populates the goAML submission form, stages the Verdachtsmeldung for final sign-off, and triggers the 3-day transaction freeze — all with a single approval.
This transforms the Compliance Monitor from “helpful assistant that drafts reports” to “autonomous compliance operations” — with guardrails. Approval workflows ensure the MLRO still signs off. Rollback mechanisms allow freeze cancellation. The audit trail captures not just the recommendation but the execution.
The consulting differentiator: Generic AI can write text. This agent speaks GwG. It knows the goAML field structure, the FIU submission format, the freeze timeline under §44 GwG, and the tipping-off prohibition under §47. That regulatory integration — not the LLM itself — is what justifies premium pricing and makes the service impossible to replicate with ChatGPT and a prompt.