Most workflow engines are complicated. They have state machines, message queues, supervisor trees, fan-out trackers, and coordination protocols. They treat a workflow as a living process that must be kept alive, checkpointed, and recovered.
There is a simpler model. A workflow is a table. A step is a row. An executor watches for rows with a ready flag, picks them up, and runs each one in its own process. That’s the whole engine.
The Data Model
Two tables.
workflows – one row per workflow instance. Stores the name, the step sequence/flow definition, the input, the status, and who triggered it. The workflow owns the logic of what comes next.
workflow_steps – one row per step. Each row has:
id -- millisecond timestamp, doubles as creation time
workflow_id -- foreign key
name -- "fetch_data", "send_email", "await_approval"
tool -- which MCP tool to call
args_json -- input args
result_json -- output, written when the step completes
status -- pending | ready | running | done | failed
ready_at -- when the ready flag was set (for ordering)
started_at -- when the executor picked it up
completed_at
A step starts as pending. An event sets it to ready. The executor picks it up and runs it. When it finishes, the executor asks the workflow what comes next, creates that step, and marks it ready. The cycle continues.
Steps have no awareness of each other. A step does one thing and reports the result. That’s all it knows.
The Executor Owns the Table
The executor is the only process that reads from or writes to the steps table. No other process touches it directly. Alarms, UI events, step completions – they all send messages to the executor. The executor decides what to do with those messages and performs the database operations itself.
This is not a bureaucratic rule. It’s the reason the system stays coherent. One writer means no races, no contention, no need for locks or transactions across processes. The executor is the single source of truth for step state, and it enforces that by being the only one allowed to change it.
If you want to mark a step ready, you don’t write to the database – you send a message to the executor. The executor writes it.
What Events Look Like
Events are dumb. They know nothing about workflows, steps, or logic. They carry only a signal and a minimal payload.
An alarm fires – sends {:step_ready, step_id} to the executor. The alarm doesn’t know what the step does or what workflow it belongs to. It just fires the message at the scheduled time.
A user presses a button – the UI sends {:step_ready, step_id} to the executor via REST or MCP. The button doesn’t know what it’s unblocking. It sends the signal.
A step completes – the step process sends {:step_done, step_id, result} to the executor. The step doesn’t know what comes after it. It reports what it produced and exits.
A task completes, a message arrives, a webhook fires – any of these can send {:step_ready, step_id} to the executor. They don’t need to understand the workflow. They just know a step ID and that it should be unblocked.
Every event reduces to one of two messages: step_ready or step_done. The executor handles both.
The Executor’s Two Jobs
On {:step_ready, step_id}:
- Load the step row
-
Mark it
running - Spawn a Task process to execute the tool
- Move on
On {:step_done, step_id, result}:
-
Write the result to the step row, mark it
done - Ask the workflow: “what comes next after this step?”
- The workflow looks at its flow definition and the result, and returns the next step spec (or nothing, if the workflow is complete)
- If there is a next step, the executor creates it as a new row in the steps table
-
Mark the new step
ready - The poll loop picks it up
The executor doesn’t contain any workflow logic. It doesn’t know what “comes next” means. It asks the workflow. The workflow definition holds the sequence, the branching rules, the conditions. The executor just carries out instructions.
How Branching Works
Branching lives entirely in the workflow definition. When a step completes and the executor asks “what’s next?”, the workflow evaluates the step’s result against its flow definition and returns the appropriate next step spec. One next step – whichever branch applies.
The executor doesn’t see a branch. It sees: here is the next step to create. It creates it and marks it ready. The branching decision was made by the workflow, invisibly, between the question and the answer.
Parallel steps work the same way. The workflow can return multiple next step specs. The executor creates all of them, marks all of them ready, and the poll loop picks them up concurrently. When each one completes, the executor asks the workflow what comes next. The workflow tracks how many parallel branches remain and only returns the join step once all of them are done.
How Approval Gates Work
An approval gate is a step the executor creates in pending state – not ready. The executor creates it, writes a Pushover notification to the approver, and stops. Nothing else happens until an external event fires.
When the human approves – button in the UI, MCP tool call, response to a notification – the handler sends {:step_ready, step_id} to the executor. The executor marks the step ready. The poll loop picks it up. Since the tool is null, execution is trivial: the executor writes approved as the result, asks the workflow for the next step, and continues.
The approval gate requires no special case in the executor. It’s a pending step waiting for an event. Every step is.
Alarms as Event Sources
Alarms integrate without any coupling. An alarm fires a callback. The callback sends {:step_ready, step_id} to the executor. The alarm knows a step ID and a message shape. Nothing else.
This is enough to do everything time-based:
-
Schedule a workflow to run at 8am – create the first step as
pending, schedule an alarm for 8am that sendsstep_readyto the executor -
Retry a failed step in 5 minutes – when the executor handles a step failure, it schedules an alarm for
now + 5 minutesand creates a new step row (same spec, incremented attempt counter) inpendingstate; the alarm unblocks it -
Escalate an unanswered approval – alarm fires after 24 hours, sends
step_readywith the escalation step ID; the executor creates a notification step
The alarm system doesn’t need to understand workflows to participate in them. It fires messages. The executor acts on them.
Crash Recovery Is Free
All state is in SQLite. The executor owns the table and is the only writer, so the table is always consistent – there are no partial writes from multiple processes to reconcile.
When the executor restarts, it polls for ready steps it missed and runs them. A watchdog query finds steps stuck in running (the Task process died) and marks them failed or queues a retry. Steps in pending wait untouched until their event arrives.
No in-memory state to reconstruct. No coordination protocol to resume. No message queue to replay. The executor comes back up and the table tells it exactly what to do.
What This Architecture Achieves
Each piece knows only what it needs to:
- Steps know how to run a tool and report a result. Nothing else.
- The workflow knows the sequence and the branching logic. It answers “what’s next?” given a step name and a result.
- The executor knows how to run steps, handle events, and ask the workflow for the next step. It owns the table.
- Events (alarms, UI, completions, webhooks) know a step ID and a message shape. Nothing else.
No piece needs to understand the others beyond its immediate interface. Alarms don’t know about workflows. Steps don’t know about other steps. The workflow doesn’t know about execution. The executor doesn’t know about tool logic or branching rules.
This is what minimal coupling actually looks like in practice. Each boundary is a message. Each message is small. Each receiver does exactly one job.
The complexity of a workflow system – sequencing, branching, retries, approvals, scheduling, crash recovery – is fully present. It’s just distributed across pieces that are each individually simple, each individually testable, and each individually replaceable.