# Alarm A persistent alarm scheduler backed by SQLite with full observability, retry logic, and timeout detection. --- ## Table of Contents 1. [Overview](#overview) 2. [Features](#features) 3. [Dependencies](#dependencies) 4. [Database Schema](#database-schema) 5. [Usage](#usage) 6. [API Reference](#api-reference) 7. [Internals / Flow](#internals--flow) 8. [WebSocket Interface](#websocket-interface) 9. [Troubleshooting](#troubleshooting) 10. [Related Documentation](#related-documentation) --- ## Overview Alarm is a registered GenServer that manages scheduled tasks with database persistence. When an alarm fires, it sends a message to a target process via `GenServer.cast/2`. All alarms are stored in SQLite through `Sqler`, so they survive process restarts and application reboots. On initialization, Alarm starts its own `Sqler` instance (`:alarm` database) and a `Timer` process. It then scans the database for pending alarms and schedules the next one. Only one timer runs at a time -- when it fires, the next pending alarm is loaded from the database and scheduled. The module provides full observability into alarm lifecycle: query individual alarms, filter history by status or process, view aggregate statistics, and inspect failed deliveries. A periodic sweep detects alarms that fired but never completed within the stale threshold (5 minutes). Target processes can report back via `Alarm.completed/2` and `Alarm.delivery_failed/2`, enabling end-to-end tracking and automatic retry with exponential backoff. ## Features | Feature | Description | |---------|-------------| | SQLite persistence | All alarms stored via Sqler -- survive restarts | | Natural language parsing | "tomorrow at 5pm", "in 30 minutes" via Chronic | | Crash recovery | Pending alarms restored from database on init | | Single-timer optimization | Only the next alarm is held in memory; rest live in SQLite | | Retry with backoff | Configurable `max_retries` with exponential backoff on delivery failure | | Completion tracking | Target processes report success/failure via `completed/2` and `delivery_failed/2` | | Duration measurement | `duration_ms` computed from `started_at` to `completed_at` | | Timeout sweep | Periodic sweep (60s) detects stale alarms that fired but never completed (5min threshold) | | Provenance | `created_by` (user ID) and `created_via` (channel: mcp, web, iex) for audit | | History & stats | Query by status, process, time range; aggregate counts and average duration | | Soft deletes | Alarms marked with `started_at`, `cancelled_at`, or `completed_at` -- never hard-deleted | | Optimistic locking | Updates use `updated_at` for safe concurrent modification | | Safe deserialization | Uses `to_existing_atom/1` and `binary_to_term(bin, [:safe])` to prevent injection | | WebSocket integration | Registered in `WsRegistry` for live UI queries (list, count, history, stats, failed, get) | ## Dependencies | Module | Purpose | |--------|---------| | `Sqler` | SQLite database wrapper -- owns the `:alarm` database | | `Timer` | Manages a single `Process.send_after/3` reference -- fires alarms | | `Chronic` | Parses natural language time strings into Erlang datetime tuples | | `SystemTimezone` | Resolves the system's local timezone for Chronic | | `WsRegistry` | Elixir Registry for WebSocket process discovery | ## Database Schema Table `alarm` on the Sqler instance started with path `"alarm"`: | Column | Type | Default | Description | |--------|------|---------|-------------| | `id` | INTEGER PK | | Millisecond timestamp (Sqler convention) | | `updated_at` | INTEGER | | Used for optimistic locking | | `set_at` | INTEGER NOT NULL | | Unix timestamp (seconds) when the alarm should fire | | `process` | TEXT NOT NULL | | Target process name (e.g. `"Elixir.Pushover"`) | | `message` | BLOB NOT NULL | | Erlang binary term (`:erlang.term_to_binary/1`) | | `started_at` | INTEGER | | Set when alarm fires -- first-fire timestamp preserved on retries | | `reply` | TEXT | | Legacy field for deserialization failure reasons | | `cancelled_at` | INTEGER | | Set when alarm is cancelled or reset | | `completed_at` | INTEGER | | Set when target reports success via `completed/2` | | `duration_ms` | INTEGER | | `(completed_at - started_at) * 1000` | | `result` | TEXT | | Outcome: `"ok"`, `"FAILED: reason"`, `"RETRY: reason"`, `"timeout_unknown"` | | `created_by` | TEXT | | User ID that created the alarm (provenance) | | `created_via` | TEXT | | Channel: `"mcp"`, `"web"`, `"iex"` (provenance) | | `max_retries` | INTEGER | 0 | Maximum retry attempts on delivery failure | | `attempts` | INTEGER | 0 | Current retry count | | `backoff_ms` | INTEGER | 5000 | Base backoff delay (doubled per retry: 5s, 10s, 20s...) | **Alarm states** (determined by column values, not a status column): | State | Condition | |-------|-----------| | Pending | `started_at IS NULL AND cancelled_at IS NULL` | | Fired | `started_at IS NOT NULL AND completed_at IS NULL AND cancelled_at IS NULL` | | Completed | `completed_at IS NOT NULL` | | Failed | `result LIKE 'FAILED:%'` | | Cancelled | `cancelled_at IS NOT NULL` | | Timeout | `result = 'timeout_unknown'` | > The query `set_at - id/1000` computes the delay between record creation and the scheduled fire time. This is intentional -- Sqler IDs are millisecond timestamps. ## Usage ### Schedule an alarm with a Unix timestamp ```elixir {:ok, id} = Alarm.set_timer(1736882445, Pushover, {:send, %{title: "Reminder", message: "Call client"}}) ``` ### Schedule with natural language ```elixir {:ok, id} = Alarm.set_timer("tomorrow at 9am", Pushover, {:send, %{title: "Morning standup"}}) {:ok, id} = Alarm.set_timer("in 30 minutes", Pushover, {:send, %{title: "Break time"}}) ``` ### Schedule with options (provenance, retry) ```elixir {:ok, id} = Alarm.set_timer( "tomorrow at 9am", Pushover, {:send, %{message: "Follow up with client"}}, created_by: "1", created_via: "mcp", max_retries: 3, backoff_ms: 10_000 ) ``` ### Query alarm status ```elixir {:ok, alarm} = Alarm.get(id) # => {:ok, %{"id" => 1709395200000, "set_at" => 1709398800, "result" => "ok", ...}} ``` ### Query history with filters ```elixir Alarm.history(status: "completed", limit: 10) Alarm.history(status: "failed", process: "Elixir.Pushover", since: 1709395200) ``` ### View aggregate stats ```elixir Alarm.stats() # => %{total: 42, pending: 3, fired: 1, completed: 30, failed: 2, cancelled: 6, timeout_unknown: 0, avg_duration_ms: 1250} ``` ### View recent failures ```elixir Alarm.failed(limit: 5) ``` ### Cancel a scheduled alarm ```elixir {:ok, :cancelled} = Alarm.cancel(id) ``` ### List all pending alarms ```elixir Alarm.list_ids() # => [[1709395200000, 1709398800, 3600]] # [id, set_at, time_offset] ``` ### Reset all alarms ```elixir Alarm.reset() ``` ### Parse a time string without scheduling ```elixir unix = Alarm.parse_timestamp("next friday at 3pm") # => 1709913600 ``` ## API Reference ### `start_link/1` Starts the Alarm GenServer, registered under the module name. **Parameters:** - `opts` (keyword) -- `db_path` (default `"alarm"`), `ws_registry` (default `WsRegistry`) **Returns:** `{:ok, pid}` or `{:error, reason}` ```elixir {:ok, pid} = Alarm.start_link() ``` --- ### `set_timer/3` Schedules an alarm to fire at a specific time. When it fires, `message` is sent to `process` via `GenServer.cast/2`. **Parameters:** - `set_at` (integer | string) -- Unix timestamp in seconds, or a natural language string - `process` (atom) -- registered name of the target process - `message` (term) -- any Erlang term to send when the alarm fires **Returns:** `{:ok, id}` or `{:error, reason}` ```elixir {:ok, id} = Alarm.set_timer(1736882445, Pushover, {:send, %{title: "Alert"}}) {:ok, id} = Alarm.set_timer("in 30 minutes", Pushover, {:send, %{title: "Reminder"}}) ``` --- ### `set_timer/4` Schedules an alarm with additional options for provenance and retry behavior. **Parameters:** - `set_at` (integer | string) -- Unix timestamp or natural language string - `process` (atom) -- registered name of the target process - `message` (term) -- any Erlang term to send when the alarm fires - `opts` (keyword) -- options: - `:created_by` (string) -- user ID for audit trail - `:created_via` (string) -- channel: `"mcp"`, `"web"`, `"iex"` - `:max_retries` (integer) -- retry attempts on delivery failure (default 0) - `:backoff_ms` (integer) -- base backoff in ms, doubled per retry (default 5000) **Returns:** `{:ok, id}` or `{:error, reason}` ```elixir {:ok, id} = Alarm.set_timer( "tomorrow at 9am", Pushover, {:send, %{message: "Follow up"}}, created_by: "1", created_via: "mcp", max_retries: 2, backoff_ms: 5000 ) ``` --- ### `cancel/1` Cancels a pending alarm by its database ID. Sets `cancelled_at` and reschedules the next pending alarm. **Parameters:** - `id` (integer) -- the alarm's database ID **Returns:** `{:ok, :cancelled}` or `{:error, :not_found}` ```elixir {:ok, :cancelled} = Alarm.cancel(1709395200000) ``` --- ### `reset/0` Cancels all pending alarms by setting `cancelled_at` on every pending row. Also cancels the active timer. **Returns:** `:ok` ```elixir Alarm.reset() ``` > Soft delete -- alarm records remain in the database for audit purposes. --- ### `list_ids/0` Returns all pending alarms as a list of `[id, set_at, time_offset]` triples, ordered by `set_at` ascending. **Returns:** `[[integer, integer, integer]]` ```elixir Alarm.list_ids() # => [[1709395200000, 1709398800, 3600]] ``` --- ### `get/1` Retrieves a single alarm by ID with all fields deserialized into a map. **Parameters:** - `id` (integer) -- the alarm's database ID **Returns:** `{:ok, map}` or `{:error, :not_found}` ```elixir {:ok, alarm} = Alarm.get(1709395200000) # => {:ok, %{"id" => 1709395200000, "set_at" => 1709398800, "process" => "Elixir.Pushover", # "result" => "ok", "duration_ms" => 1250, "created_via" => "mcp", ...}} ``` The `message` field is deserialized from binary and returned as an `inspect`-ed string for display safety. --- ### `history/1` Queries alarm history with optional filters. **Parameters:** - `opts` (keyword) -- filter options: - `:status` -- `"pending"`, `"fired"`, `"completed"`, `"failed"`, `"cancelled"` - `:process` -- filter by target process name (e.g. `"Elixir.Pushover"`) - `:since` -- Unix timestamp (seconds) -- only alarms created after this time - `:limit` -- max results (default 50) **Returns:** `[map]` -- list of alarm maps, newest first ```elixir Alarm.history(status: "completed", limit: 10) Alarm.history(process: "Elixir.Pushover", since: 1709395200) Alarm.history(status: "failed") ``` --- ### `stats/0` Returns aggregate alarm statistics. **Returns:** map with keys: | Key | Description | |-----|-------------| | `total` | All alarms ever created | | `pending` | Waiting to fire | | `fired` | Fired but not yet completed | | `completed` | Successfully completed | | `failed` | Failed (result starts with `"FAILED:"`) | | `cancelled` | Cancelled by user or reset | | `timeout_unknown` | Fired but never completed within stale threshold | | `avg_duration_ms` | Average duration of completed alarms (ms) | ```elixir Alarm.stats() # => %{total: 42, pending: 3, fired: 1, completed: 30, failed: 2, # cancelled: 6, timeout_unknown: 0, avg_duration_ms: 1250} ``` --- ### `failed/1` Returns recently failed alarms (result starts with `"FAILED:"` or equals `"timeout_unknown"`). **Parameters:** - `opts` (keyword) -- `:limit` (default 20) **Returns:** `[map]` -- list of alarm maps, newest first ```elixir Alarm.failed(limit: 5) ``` --- ### `fired/1` Called by `Timer` when an alarm fires. Marks `started_at` in the database (preserving the first-fire timestamp on retries) and schedules the next pending alarm. **Do not call directly.** **Parameters:** - `id` (integer) -- the alarm's database ID **Returns:** `:ok` ```elixir # Called internally by Timer -- not for direct use Alarm.fired(1709395200000) ``` --- ### `completed/2` Called by target processes to report successful completion. Sets `completed_at`, computes `duration_ms`, and stores the result string. **Parameters:** - `id` (integer) -- the alarm's database ID - `result` (string) -- outcome description (e.g. `"ok"`, `"sent to 3 recipients"`) **Returns:** `:ok` ```elixir # Called by target process after handling the alarm message Alarm.completed(alarm_id, "ok") ``` --- ### `delivery_failed/2` Called by target processes to report delivery failure. If retries remain, schedules a retry with exponential backoff. Otherwise marks the alarm as permanently failed. **Parameters:** - `id` (integer) -- the alarm's database ID - `reason` (string) -- failure description **Returns:** `:ok` **Retry behavior:** - Backoff doubles per attempt: `backoff_ms * 2^(attempt-1)` - Default: 5s -> 10s -> 20s -> 40s... - On retry, `result` is set to `"RETRY: reason (attempt N/max)"` - On exhaustion, `result` is set to `"FAILED: reason (after N attempts)"` ```elixir # Called by target process when delivery fails Alarm.delivery_failed(alarm_id, "HTTP 503 from Pushover API") ``` --- ### `parse_timestamp/2` Parses a natural language time string into a Unix timestamp using the Chronic library. **Parameters:** - `chronic_str` (string) -- natural language time (e.g. `"tomorrow at 5pm"`, `"in 2 hours"`) - `tz` (string | nil) -- optional timezone override (defaults to system timezone) **Returns:** Unix timestamp (integer, seconds since epoch) **Raises:** `ArgumentError` if the string cannot be parsed ```elixir Alarm.parse_timestamp("tomorrow at 9am") # => 1709535600 Alarm.parse_timestamp("next friday at 3pm", "America/New_York") # => 1709913600 ``` ## Internals / Flow ### Initialization ``` Alarm.init/1 |-- Start Sqler with path "alarm" |-- Start Timer (singleton -- reuses existing if running) |-- CREATE TABLE IF NOT EXISTS alarm (...) |-- ALTER TABLE for new columns (idempotent) |-- Query next pending alarm (earliest set_at) |-- Schedule it with Timer.set_timer/4 |-- Register in WsRegistry as {:ws, "alarm"} +-- Start timeout sweep (every 60s) ``` ### Alarm Lifecycle ``` set_timer(set_at, process, message, opts) |-- Insert row into alarm table (with provenance + retry config) |-- Check: does new alarm fire before current timer? | |-- Yes -> Timer.set_timer(set_at, process, message, id) | +-- No -> do nothing (current timer fires first) +-- Return {:ok, id} [Timer fires] |-- Timer sends GenServer.cast(process, message) |-- Timer calls Alarm.fired(id) |-- Alarm marks row: started_at = now (preserved on retries) |-- Alarm queries next pending alarm +-- Alarm schedules it with Timer [Target process completes successfully] |-- Target calls Alarm.completed(id, "ok") |-- Alarm sets completed_at, computes duration_ms, stores result +-- End of lifecycle [Target process fails] |-- Target calls Alarm.delivery_failed(id, reason) |-- If attempts < max_retries: | |-- Compute backoff: backoff_ms * 2^(attempts-1) | |-- Schedule retry via Timer at (now + backoff) | +-- Set result: "RETRY: reason (attempt N/max)" +-- If retries exhausted: |-- Set result: "FAILED: reason (after N attempts)" +-- Schedule next pending alarm [Timeout sweep -- every 60s] |-- Find alarms where started_at < (now - 300s) | AND completed_at IS NULL AND cancelled_at IS NULL | AND result NOT LIKE 'FAILED:%' AND result NOT LIKE 'RETRY:%' +-- Mark as result: "timeout_unknown" ``` ### Safe Deserialization When restoring alarms from the database, Alarm uses defensive deserialization: - `String.to_existing_atom/1` for the process name -- prevents atom table exhaustion - `:erlang.binary_to_term(bin, [:safe])` for the message -- prevents arbitrary code execution If deserialization fails, the alarm is marked as failed and the next alarm in the queue is scheduled. ## WebSocket Interface Alarm registers in `WsRegistry` under `{:ws, "alarm"}` and handles these WebSocket calls: | Call | Args | Returns | |------|------|---------| | `{:ws, "list", _}` | -- | List of pending alarm maps `[%{id, set_at, time_offset}]` | | `{:ws, "count", _}` | -- | Integer count of pending alarms | | `{:ws, "history", args}` | `%{"status" => ..., "limit" => ..., "process" => ..., "since" => ...}` | Filtered alarm history | | `{:ws, "stats", _}` | -- | Aggregate statistics map | | `{:ws, "failed", args}` | `%{"limit" => ...}` | Recent failed alarms | | `{:ws, "get", args}` | `%{"id" => ...}` | Single alarm map | ## Troubleshooting **Alarm did not fire** - Check if the alarm exists: `Alarm.list_ids()` -- if absent, it may have been cancelled or already fired - Check if Timer is alive: `Process.whereis(Timer)` -- Alarm starts Timer in `init/1` - Check logs for "Failed to deserialize" -- the process name may no longer exist as an atom - Check `Alarm.get(id)` -- the `result` field shows the outcome **Alarm fired but shows `timeout_unknown`** - The target process did not call `Alarm.completed/2` within 5 minutes - Ensure the target process calls `Alarm.completed(alarm_id, result)` after handling the message **Delivery retries not working** - Check that `max_retries` was set when creating the alarm: `Alarm.get(id)` -- verify `max_retries > 0` - The target process must call `Alarm.delivery_failed/2` to trigger retries. If the target silently fails, use the timeout sweep as a safety net. **Natural language parsing fails** - Chronic requires a recognizable format. Try simpler phrases: `"in 5 minutes"`, `"tomorrow at 9am"`, `"next friday at 3pm"` - Override timezone: `Alarm.parse_timestamp("tomorrow at 9am", "America/New_York")` **Alarms not restored after restart** - Verify the SQLite file exists: `data/alarm.sqlite` - Only pending rows (both `started_at IS NULL` and `cancelled_at IS NULL`) are rescheduled **"Unknown process" error in logs** - The target process atom must exist in the BEAM atom table. If the module was renamed, the alarm is marked as failed and skipped. ## Related Documentation - [Timer](timer.md) -- single-timer manager used by Alarm - [Sqler](sqler.md) -- SQLite database wrapper - [Pushover](pushover.md) -- common alarm target for push notifications --- *Source: `lib/alarm.ex` -- Last updated: 2026-03-13*