How to Keep Your AI Agents on Task

How to keep your AI agents on task: preventing context drift and ADD-like behavior

The agent that worked perfectly on run one and produces sprawling, off-task outputs by run twenty has not forgotten what it is supposed to do.

It has accumulated enough competing context that the original task instruction is no longer the dominant signal in the window. The AI system owner is the role responsible for catching this drift before it reaches production.

This is context drift. It is one of the most predictable failure modes in production AI agent workflows, related to, but distinct from, context rot, which degrades agents through outdated information rather than accumulated noise.

The diagnosis is almost always the same. The fix is almost always the same. Neither requires a technical background to implement.

The four drift patterns: diagnosing which one is happening

Pattern 1: Scope expansion

What it looks like: the agent was asked to produce a 200-word client summary. It produces 600 words and adds a section on “recommended next steps” nobody asked for. Or it was asked to extract five specific fields from a document and returns twelve.

Why it happens: the task instruction defines what to produce but not what to exclude. The model interprets the absence of boundaries as permission to add value.

The diagnostic question: “Did my task instruction explicitly define what the output should NOT include?” If no. Scope expansion is predictable.

Pattern 2: Instruction decay

What it looks like: the agent followed the formatting instruction correctly in the first five outputs. By output fifteen, the formatting has reverted to the model’s default. The original instruction is technically still in the context window. But it is buried under fifteen rounds of accumulated output.

Why it happens: recency bias. Models weight recent content more heavily than older content. In a long session, the initial instruction is the oldest content in the window. The most recent output becomes the implicit template the model follows.

The diagnostic question: “Is the output quality degrading over the course of a single session without any change in the inputs?” If yes. Instruction decay is likely.

Pattern 3: Cross-task contamination

What it looks like: the agent is running a client follow-up email workflow. The email for client B references details that are specific to client A. Because client A’s details were in the session history from an earlier run.

Why it happens: prior session outputs that were not properly cleared are still in the context window. The model treats all context window content as potentially relevant. Including content from two runs ago.

The diagnostic question: “Are outputs from earlier runs still present in the context when the new run starts?” If yes. Cross-task contamination is predictable.

Pattern 4: Context flooding

What it looks like: the agent produces outputs that are technically correct but generic. A client proposal that sounds like it was written for any company in the industry. A support response that covers every possible interpretation of the question rather than the most likely one.

Why it happens: too much background context relative to the specific task instruction. When the context window contains the full context pack, the full client history, and the full product documentation. The task instruction is a small signal in a large context.

The diagnostic question: “Is the context-to-task-instruction ratio in this session heavily weighted toward context?” If yes. Context flooding is reducing task specificity.

The three controls: what to adjust for each pattern

Control 1: Task scoping (fixes scope expansion and context flooding)

Task scoping means making the task instruction explicit about both what to produce and what not to produce.

Before (scope-ambiguous):

“Write a follow-up email to the client after today’s call.”

After (scope-specific):

“Write a follow-up email to the client after today’s call. The email must be under 150 words. It must cover exactly three points: (1) what was agreed on the call, (2) the next action and who owns it, (3) the date of the next check-in. Do not include pleasantries beyond a single opening sentence. Do not add any additional points or recommendations.”

The after instruction tells the model not just what to include but what to exclude. This is the fix for scope expansion.

For context flooding: scope the context to match the task. If the task is writing a follow-up email, load the voice guide and the specific client’s archetype. Not the full context pack, full product documentation, and full client history.

The context loaded should be proportionate to what the specific task requires.

Control 2: Context pruning (fixes instruction decay and cross-task contamination)

Context pruning means actively managing what is in the context window rather than allowing it to accumulate.

For instruction decay: place the most important instruction at the end of the prompt, not the beginning. Recency bias works in your favor when the instruction is the last thing in the context window before the model generates output.

Standard instruction placement format:

[Load: relevant context sections — limited to what this task needs]
[Load: session handover from prior run — structured summary only, not full output]
[Current input: the specific data or document for this run]

TASK INSTRUCTION: [The specific, bounded task — placed last]

Placing the task instruction last rather than first is one of the simplest prompt adjustments that reduces instruction decay.

For cross-task contamination: never carry full prior run output into the next run. Between runs, the agent produces a structured summary. And only that summary travels forward. The full output is archived, not accumulated.

Control 3: Session hygiene (fixes instruction decay and cross-task contamination at the architectural level)

Session hygiene means resetting the session at appropriate intervals. Not accumulating context indefinitely.

The session boundary rule: when the task changes materially (different client, different document type, different workflow step), start a new session. Load the clean context for the new task rather than carrying accumulated context from the previous one.

The session length limit: for workflows that run continuously on similar inputs, limit each session to 10–15 runs before starting a fresh session with the original clean context. This prevents instruction decay from accumulating across a long batch run.

Implementation: in Make or Zapier workflows, add a counter that tracks the number of runs in a session. When the counter reaches the limit, the workflow creates a new session before the next run rather than continuing the existing one.

The agent instruction architecture: how to write task instructions that stay focused

The instruction architecture that minimizes context drift has five components, in this order.

Component 1: Role statement (one sentence)

“You are a [specific role] for [company name]. Your function is [specific, bounded function].”

Not “You are a helpful AI assistant.” That is too broad and invites scope expansion.

Component 2: Specific context (only what this task needs)

Load the minimal context required for this task. The relevant sections of the context pack, the specific client’s archetype, the document being processed. Not everything. Only what this specific run requires.

Component 3: The task (specific, bounded, with exclusions)

Produce [specific output] from [specific input].
The output must [explicit requirements].
The output must not [explicit exclusions].
The output is complete when [specific completion criterion].

The completion criterion is the most underused element. Without it, the model decides when the output is complete. Which produces scope expansion. With it, the model stops when the defined condition is met.

Component 4: Output format (explicit)

“Format the output as follows: [exact structure, labels, and length constraints].”

When the output format is explicit, the model has less latitude to drift into additional sections or length expansions.

Component 5: Boundary statement (what this agent does not do)

“If the input contains [categories outside the task scope], do not process them. Return only the structured output below. Do not add commentary, recommendations, or content outside the specified format.”

The boundary statement is the explicit scope fence. It is the single most effective instruction addition for preventing scope expansion.

The monitoring system: how to catch drift before it becomes a quality problem

The drift detection checklist (run weekly on each active workflow):

Ask five questions about the last 10 outputs from each workflow:

Are outputs consistently within the specified length range? (Scope expansion signal: outputs trending longer)
Is formatting consistent across the last 10 outputs? (Instruction decay signal: format drifting from specification)
Do any outputs reference content from other clients, projects, or runs? (Cross-task contamination signal)
Are outputs specific to the input or generic? (Context flooding signal: outputs that could have been produced for any input)
Has the acceptance rate on this workflow changed in the last two weeks? (General drift signal: declining acceptance suggests one or more patterns present)

Any “yes” to the first four questions or “declining” to the fifth: the workflow needs a diagnostic review before the next batch run.

The fix priority:

Immediate: review the last five outputs of any flagged workflow for quality before they are used
Session reset: start the next run on a clean session rather than continuing the accumulated one
Instruction review: identify whether the drift is caused by scope ambiguity, missing exclusions, or instruction placement. And fix the specific element

Common questions on AI agent context drift

”Is context drift the same as hallucination?”

No. Hallucination is the model inventing information that was never provided. Context drift is the model producing outputs that are technically correct based on everything in the context. But the context has accumulated to the point where the task instruction is no longer the dominant signal. The outputs are not invented. They are distracted.

”Does this happen with all AI models or just some?”

All models with context windows experience drift patterns when context management is poor. The degree varies by model. But scope expansion, instruction decay, and context flooding are universal consequences of poor context management, not model-specific bugs. This applies equally when sharing AI memory across models on the same project.

”How do I know if my agent is drifting or the model has genuinely changed?”

Run the first 10 outputs of the session on a clean context load. If quality is high on a clean session and degrades over a long session: context drift. If quality is low even on a clean session compared to historical performance: model drift or context pack degradation.

”Can I automate the session reset?”

Yes. In Make or Zapier, a counter variable increments with each workflow run. When it hits the session limit (10–15 runs), a conditional branch triggers a new session creation before the next run. The counter resets. This is buildable in 30–60 minutes without coding.

”Does starting a new session lose important context?”

No. If the structured session summary architecture is in place. The session summary (produced at the end of each session) travels forward into the new session. The full accumulated output history is archived. The new session starts with clean context plus the structured summary of what the previous session produced.

”What is the maximum number of runs before I should reset a session?”

Task type	Recommended session limit
Complex, nuanced tasks (proposal drafting, detailed analysis)	5–8 runs
Most business workflows	10–15 runs
Short, structured tasks (simple classification, field extraction)	20–25 runs

Want your agent workflows designed with drift prevention built in; not diagnosed after it appears?

AI agents do not drift because the model is unreliable. They drift because context management is hard and most workflow architectures do not actively manage it.

The four drift patterns are predictable. The fixes are specific: scope the task instruction explicitly, prune the context to what the task needs, reset the session at the right intervals, and monitor the outputs for drift signals before they compound.

A workflow designed with these controls from the start produces consistent outputs at run 200 that match run 1. One that is not will produce drift by run 20. Once drift is resolved, building a feedback loop that makes agents self-improving is the natural next step toward AI-native operations that compound in quality over time. See also how to prevent AI agent memory bloat for related architecture guidance.

Path one: audit your three most active workflows today. Run the five-question drift detection checklist on each one. Any that fail are already drifting. The session reset and instruction review steps above will restore quality within the next batch run.

Path two: bring in a partner. If you want your agent workflows redesigned with drift prevention built into the instruction architecture from the ground up. That is the work Phos AI Labs does in Phase 4. In 400+ AI implementations, the companies that get this right all did the same thing first. The fastest way to know if it is the right fit is a conversation. Thirty minutes, no deck. Start here.