System evolution case study

AI Agents

A project-history view of how my agent system evolved from loose automation into a more legible and governable architecture.

Trajectory

From single-agent planning to bounded operations

This work tracks the shift from nondeterministic automation toward a more observable system with better routing, telemetry, and control.

  • Versioned through multiple planning architectures
  • Centered on observability and capacity-aware structure
  • Ended in a governable multi-agent model

Problem / Context

This post traces how my AI-agent work moved from loose daily automation toward a more inspectable operating system for planning, research, and execution.

Thoughts on AI and AI Agents in General

How I feel about AI in general is fairly simple: it is a tool. Like every tool, it needs to be used properly. Also like every tool, a person should get to the point where they are not fully dependent on it. If it is used poorly, it can absolutely dull reasoning and weaken good habits of thought. A person should not be afraid to have AI challenge them, but they also should not let a language model drag them toward views they do not believe are morally or ethically right.

I believe AI is the next great equalizer in assistive technology. AI can simplify or structure language in more accessible ways and make text-to-speech and speech-to-text feel much more natural, among many other possibilities that I find genuinely exciting. Personally, AI functions as a sounding board and as part of my capacity management system. It helps me do things I love while keeping the associated costs legible enough that recovery stays possible. I use an LLM every day to help devise what I am doing, but I can still reconstruct that information later. I know what I am putting into the model and what is coming back out. I use AI to reduce the emotional and cognitive load of work that I can do myself but often should not have to do the hard way.

The interesting question is not just whether AI can do something, but what kind of structure makes that help responsible and useful.

Where I Started

The previous structure of my agentic system was a single-agent, nondeterministic workflow. It brought some structure to my life, but not enough. As a starting point, it helped reduce decision fatigue by integrating calendar data, academic deadlines, and personal commitments into structured daily plans. However, the system only structured what was on my schedule, not necessarily how to begin a task. It also could not accurately estimate how much energy a task would take, which is part of what pushed me to devise the Work Unit system.

I kept iterating. Version two incorporated Work Units and grew into an overbuilt web of Apple Shortcuts, many of them involving LLM calls. It was creative, but it was not deterministic enough. Version three became more engineered and less chaotic by moving more preprocessing into code and narrowing the system back down to a single planning output in the form of a daily briefing email. It was better, but it still had serious blind spots.

The main problem was observability. The system could not track Work Unit expenditure over time in a sufficiently rich way. It could reason about planned events on my calendar, but not about the everyday accumulation of assignments, email, side projects, and other non-calendar work. That gap is what links this history so closely to Field Note 0 - Instrumentation. The system needed better telemetry, not just more automation.

Keeping personal data local also became more important as the telemetry system grew. Ollama looked like a promising fit, but Apple Shortcuts and local model serving were not a great combination, especially when reliability mattered at 04:00 every day. Eventually it became obvious that a different architecture was needed.

Approach / System Design

Version Four: Transition to OpenClaw

This is also a full blog post, particularly about the setup process and workflow. But the short version is that OpenClaw gave me the structure that the earlier versions lacked.

OpenClaw has native local provider support as long as the API in question supports either OpenAI or Anthropic specifications, including documentation on how to use Ollama. I had already experimented with Ollama, so that made OpenClaw feel like a natural place to keep building. OpenClaw is also open source, which matters to me because I do not want to rely on a closed company for the continued availability of a system that increasingly touches sensitive planning and personal infrastructure. I also wanted to experiment with skills and tools with frontier LLMs, and OpenClaw's alignment with the AgentSkills spec gave me a practical framework to build on.

That transition later became The Fleet: a more explicit multi-agent architecture with clearer roles, better routing, and a more governable operational model.

Outcomes / Lessons

Future Work

After migrating to OpenClaw, I wanted a clearer idea of where the system should go next. The telemetry pipeline is here to stay, especially because it will feed future agents' work and tasks. Before the full migration was complete, however, I still needed to improve that telemetry system itself. In particular, I wanted to add Work Unit calculation to Canvas assignment completion and to bring side projects like this blog into the daily plan and Work Unit calculations.

That data is critical to the Fleet. The system only becomes truly useful when it can reason about real load rather than idealized plans. In that sense, the future of the agent system has always depended on the same underlying problem: making state visible enough that planning can be trusted.

Artifacts