Developers are increasingly asking AI agents like Claude to take on complex tasks requiring hours or days of work. However, getting agents to make consistent progress across multiple context windows remains an open problem.
A core challenge is that long-running agents must work in discrete sessions, each starting with no memory of what came before. To bridge this gap, developers have developed a two-fold solution: an initializer agent sets up the environment on the first run, and a coding agent makes incremental progress while leaving clear artifacts for the next session.
One key component is a feature list that outlines all the required features. The initializer agent writes this file based on the user’s initial prompt, ensuring the model has a clear understanding of what needs to be implemented. Coding agents then edit this file by changing the status of a passes field and are prompted to make incremental progress while leaving the environment in a clean state.
Another crucial aspect is providing testing tools to identify bugs and ensure end-to-end functionality. Developers have found that using browser automation tools dramatically improves performance, as the agent can quickly identify and fix bugs that aren’t obvious from the code alone.
The updated Claude 4 prompting guide shares best practices for multi-context window workflows, including a harness structure that requests an initializer agent to set up the environment with necessary context. The article also discusses future work, including exploring whether specialized agents perform better across contexts or if a multi-agent architecture yields better results.
Source: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents