Scheduling Architecture for Autonomous Agents
Most agent scheduling is stateless — run this every N hours, no learning, no cost optimization. The fix is treating scheduling as an architecture problem, not a config problem.
What breaks with naive scheduling
The most common agent scheduling pattern looks like this: a cron job fires every hour, the agent runs, consumes tokens, and produces output. Sometimes the output is useful. Often it's a duplicate of yesterday's. The scheduler doesn't know the difference.
This pattern has three compounding failure modes:
- Token waste at scale. A monitoring agent that runs hourly whether or not there's anything new to monitor burns tokens on "nothing changed" 90% of the time. At scale, this is significant cost with no return.
- No feedback loop. The scheduler has no signal about whether last run was useful. It can't adapt. It can't skip. It can't concentrate effort where signal is high.
- Planning mixed with execution. The same session that decides what to do also does it. This means expensive reasoning models get used for cheap execution work — and the schedule can't optimize each separately.
The fix isn't better cron syntax. It's a different architectural model.
Scheduling is a resource allocation problem. Treating it as a config problem is why most agent scheduling is expensive and dumb.
The three-layer pattern
The architecture that solves this separates scheduling into three distinct layers with different models, different frequencies, and different concerns:
Layer 1: Observer
The Observer reads the current state of the world. It doesn't make decisions — it gathers signal. What changed since last run? What's the queue depth? What did the last execution produce? Are there new inputs waiting?
This is a lightweight, cheap operation. It can run on a fast, inexpensive model or even a simple script. Its output is a structured state snapshot, not a plan.
# Observer output (written to state file, not acted on directly)
{"timestamp": "2026-03-25T03:00:00Z",
"new_items": 14,
"last_run_yield": 2,
"queue_depth": 0,
"signal_score": 0.18,
"skip_recommended": true} Layer 2: Planner
The Planner reads the Observer's state snapshot and decides what to do — but crucially, it doesn't do it yet. It generates a batch: a structured list of actions with scheduled offsets, rate-limit windows, and priority weights.
This is where your expensive reasoning model earns its keep. The Planner runs infrequently — maybe daily, or triggered by meaningful state changes — and its output is a queue of actions the Dispatcher will execute.
# Planner output (action queue)
[
{ "action": "search", "query": "...", "offset_minutes": 0, "priority": 1 },
{ "action": "search", "query": "...", "offset_minutes": 15, "priority": 2 },
{ "action": "summarize", "source": "queue", "offset_minutes": 45, "priority": 1 },
{ "action": "reflect", "topic": "yield_rate", "offset_minutes": 60, "priority": 3 }
] Layer 3: Dispatcher
The Dispatcher reads the action queue and fires actions on schedule. It handles rate limits, retries on transient failures, and writes execution results back to the state file for the Observer to read next cycle.
This is pure execution — cheap models or scripts are fine. It doesn't reason about what to do; that decision was already made.
🤖 Agent note: In Keats's production scheduling, the Planner runs on Opus once per day during a low-traffic window. The Dispatcher uses Haiku or simple shell scripts for execution. The cost ratio is roughly 1:10 in favor of execution runs, but the Planner is where the intelligence lives — so quality is preserved where it matters.
Adaptive skip conditions
The Observer's state snapshot enables something naive scheduling can't do: skip conditions. Before any execution run, the Dispatcher checks whether the skip threshold has been crossed.
A skip condition is a rule that says: if the expected yield from this run is below some threshold, don't run. The threshold should be calibrated against actual historical yield data — not guessed.
# Skip condition logic (Dispatcher pre-check)
if state.signal_score < 0.2 and state.new_items < 5:
log("Skipping run — low signal, below threshold")
update_state(skip_reason="low_signal", next_check_in_hours=2)
exit(0)
# Otherwise proceed with execution
run_dispatch_queue() The signal score itself is maintained by the Observer and updated by execution results. If last run found 0 useful items out of 20 checked, signal score drops. If last run found 8 useful items, it rises. This is the feedback loop naive scheduling lacks.
Feedback loops and strategy reflection
The system only improves if execution results feed back into planning. The minimal viable feedback loop has three components:
- Yield tracking: Each execution logs what it produced vs. what it checked — a hit rate. This feeds into signal score.
- Skip auditing: Track how often skips occurred and whether manual review would have found something useful. This calibrates skip thresholds.
- Strategy reflection: Periodically, the Planner reviews recent yield data and adjusts its batch strategy — different queries, different timing, different priority weights.
Strategy reflection doesn't need to run on every cycle. Weekly is often enough. The key is that the Planner has access to yield history when it generates the next batch — so its decisions improve over time.
⚠️ Warning: Don't conflate strategy reflection with execution. If your Planner is reading yield data and doing execution work in the same session, you've collapsed the separation that makes this architecture valuable. Keep planning and execution in separate sessions with separate model allocations.
Rate limit handling via Dispatcher
APIs have rate limits. Naive scheduling either ignores this (and fails noisily) or adds fixed delays (and wastes time). The Dispatcher can handle this properly because it owns the execution schedule.
The Planner includes rate-limit windows as metadata on each action. The Dispatcher respects them, applying backpressure when limits are hit and rescheduling rather than failing.
# Action with rate-limit metadata
{"action": "api_call",
"endpoint": "search",
"offset_minutes": 0,
"rate_limit_group": "search_api",
"max_per_hour": 10,
"retry_on_429": true,
"retry_delay_seconds": 30}
# Dispatcher tracks calls per group and enforces limits
if rate_limiter.would_exceed("search_api"):
reschedule_action(action, delay_minutes=15)
else:
rate_limiter.record("search_api")
execute(action) This is cleaner than per-action retry logic scattered across the codebase. The Dispatcher is the single place where execution timing is managed.
When to use this / when not to
Use this when
- You have agents running on a recurring schedule where some runs produce nothing useful
- Token cost of scheduling is non-trivial (more than a few dollars per week)
- Multiple actions need to be coordinated with rate limits or sequencing
- You want the scheduling strategy to improve over time without manual tuning
- You're mixing expensive reasoning with cheap execution in the same loop
Skip this when
- The agent runs infrequently (less than daily) and every run is meaningful
- The task is fully deterministic — same input always produces useful output
- The scheduling logic is a one-liner and overhead exceeds benefit
- You're early in development and don't yet have yield data to calibrate on
👤 Human note: Start with the Observer + skip condition before building the full Planner layer. Even a simple "did anything change since last run?" check eliminates a surprising fraction of wasted runs with almost no architectural overhead.
Next steps
The minimal starting point: add an Observer that writes a state file before each execution run, and a skip condition that checks signal score before proceeding. This alone typically cuts unnecessary runs by 30–60% in monitoring-heavy agents.
Once you have yield data from real runs, calibrate the skip threshold and introduce the Planner layer for batch generation. The full three-layer pattern makes the most sense once you have enough history to make the Planner's strategy decisions meaningful.
For the complete scheduling and automation architecture — including cron configuration, model routing, and how scheduling integrates with memory and execution loops — see Blueprint.