Planning — When the Agent Thinks Before It Acts

Introduction: The Art of Thinking Before Acting

Imagine you are asked to organize a big event. Would you just start randomly calling people, booking venues, and ordering food — all at once with no plan? Of course not. You would first think about it: what needs to be done? In what order? What depends on what?

This is exactly what Planning means for an Agent. An Agent without planning is like a worker who is strong but thoughtless. It might do things, but it does them chaotically and often gets stuck.

In this episode, we explore the main planning patterns in Agents and see how an Agent can think before it acts.

Why Does Planning Matter?

Three main reasons:

1. Complex Tasks Are Multi-step

Most real-world tasks are not single-step. “Build me a website” involves design, coding, testing, and deployment. Without planning, the Agent does not know where to start.

2. Resource Management

Each API call costs money and time. Good planning means fewer unnecessary calls and less waste.

3. Error Recovery

When a step fails, an Agent with a plan can adjust and try alternative approaches. An Agent without a plan just stops.

Pattern 1: ReAct (Reasoning + Acting)

ReAct is one of the most important planning patterns. The idea is simple: before each action, the Agent thinks out loud.

class ReActAgent:
    def __init__(self, tools, llm):
        self.tools = tools
        self.llm = llm

    def solve(self, task: str) -> str:
        prompt = f"""Answer the following question using the available tools.
For each step, first write your Thought, then the Action, then observe the result.

Question: {task}

Format:
Thought: [your reasoning about what to do next]
Action: [tool_name(parameters)]
Observation: [result of the action]
... (repeat as needed)
Thought: I now have the final answer
Final Answer: [your answer]
"""
        # The LLM generates the thought-action chain
        # Your code executes the actions and feeds back observations
        return self._run_loop(prompt)

The ReAct loop:

Thought: Agent thinks about what to do
Action: Agent decides which tool to use
Observation: The result comes back
Repeat until the answer is found

Real-world example: User asks “What is the GDP of Japan and how does it compare to South Korea?” The Agent thinks: “First I need Japan’s GDP, then South Korea’s, then I can compare.” It searches step by step, not all at once.

Pattern 2: Chain of Thought (CoT)

Chain of Thought is about getting the LLM to show its reasoning step by step before giving a final answer. It is more about reasoning than action.

def chain_of_thought(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """
            Before answering, think step by step.
            Write your reasoning process, then give the final answer.
            Format:
            Step 1: ...
            Step 2: ...
            ...
            Final Answer: ...
            """},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

CoT is especially effective for math problems, logic puzzles, and multi-step reasoning.

Pattern 3: Task Decomposition

Break a complex task into smaller, manageable sub-tasks:

class TaskDecomposer:
    def __init__(self, llm):
        self.llm = llm

    def decompose(self, complex_task: str) -> list:
        """Break a complex task into sub-tasks"""
        response = self.llm.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": """
                Break the given task into smaller, actionable sub-tasks.
                Each sub-task should be:
                - Specific and clear
                - Independently executable
                - In the right order (dependencies considered)

                Return as JSON: {"subtasks": [{"id": 1, "task": "...", "depends_on": []}]}
                """},
                {"role": "user", "content": complex_task}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content)["subtasks"]

    def execute_plan(self, subtasks: list) -> dict:
        """Execute sub-tasks in order"""
        results = {}
        for task in subtasks:
            # Check dependencies
            deps_met = all(
                dep_id in results for dep_id in task["depends_on"]
            )
            if not deps_met:
                results[task["id"]] = "Skipped: dependencies not met"
                continue

            # Gather dependency results as context
            context = {
                dep_id: results[dep_id]
                for dep_id in task["depends_on"]
            }
            results[task["id"]] = self._execute_single(
                task["task"], context
            )

        return results

Pattern 4: Self-Reflection

One of the most powerful patterns. After completing an action, the Agent reviews its own work and decides if it needs improvement:

class ReflectiveAgent:
    def solve(self, task: str, max_reflections: int = 3) -> str:
        # Initial attempt
        solution = self._generate_solution(task)

        for i in range(max_reflections):
            # Self-reflection
            reflection = self._reflect(task, solution)

            if reflection["is_satisfactory"]:
                break

            # Improve based on reflection
            solution = self._improve(
                task, solution, reflection["feedback"]
            )
            print(f"Reflection {i+1}: {reflection['feedback'][:100]}")

        return solution

    def _reflect(self, task: str, solution: str) -> dict:
        response = self.llm.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": """
                Review the solution to the given task.
                Is it correct? Complete? High quality?
                Return JSON:
                {"is_satisfactory": true/false, "feedback": "..."}
                """},
                {"role": "user", "content":
                 f"Task: {task}\n\nSolution: {solution}"}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content)

Example: Agent writes code to solve a problem. Then it reviews its own code: “This solution works but it does not handle edge cases. I need to add error handling.” Then it improves the code. This cycle can repeat several times.

Pattern 5: Plan and Execute

First make a complete plan, then execute it step by step:

class PlanAndExecuteAgent:
    def __init__(self, planner_llm, executor_llm, tools):
        self.planner = planner_llm
        self.executor = executor_llm
        self.tools = tools

    def run(self, task: str) -> str:
        # Phase 1: Create the plan
        plan = self._create_plan(task)
        print(f"Plan: {json.dumps(plan, indent=2)}")

        # Phase 2: Execute each step
        results = []
        for step in plan:
            result = self._execute_step(step, results)
            results.append({
                "step": step,
                "result": result
            })

            # Check if plan needs adjustment
            if self._needs_replanning(task, results):
                remaining = self._replan(task, results)
                plan = plan[:len(results)] + remaining

        # Phase 3: Compile final answer
        return self._compile_answer(task, results)

The key difference from ReAct: Plan-and-Execute separates planning from execution. The planner can be a stronger model (e.g., GPT-4o) and the executor a faster model (e.g., GPT-4o-mini).

Combining Patterns

In practice, the best Agents combine multiple patterns:

Task Decomposition to break the big task into pieces
ReAct for executing each piece
Self-Reflection to check the quality of each step
Replanning when something goes wrong

class AdvancedAgent:
    def solve(self, task: str) -> str:
        # 1. Decompose
        subtasks = self.decompose(task)

        results = []
        for subtask in subtasks:
            # 2. ReAct loop for each subtask
            result = self.react_solve(subtask)

            # 3. Self-reflection
            quality = self.reflect(subtask, result)
            if not quality["is_satisfactory"]:
                result = self.improve(subtask, result, quality["feedback"])

            results.append(result)

            # 4. Check if remaining plan needs adjustment
            if self.needs_replanning(task, results, subtasks):
                subtasks = self.replan(task, results)

        return self.compile_final(task, results)

Common Planning Pitfalls

1. Over-planning

Sometimes the Agent spends so long planning that it never actually does anything. Set a limit on planning time and iterations.

2. Rigid Plans

A plan that cannot adapt to new information is useless. Build in checkpoints where the Agent can reassess and adjust.

3. Ignoring Failures

When a step fails, the Agent should not just skip it and move on. It should understand why it failed and either retry or find an alternative.

Important: Planning is not free. Each planning call to the LLM costs tokens. Balance the quality of planning with its cost. For simple tasks, a quick plan is enough. For complex tasks, invest more in planning.

Summary

ReAct: Think, then act, then observe — repeat
Chain of Thought: Step-by-step reasoning before answering
Task Decomposition: Break complex tasks into sub-tasks
Self-Reflection: Review your own work and improve
Plan and Execute: First plan completely, then execute
The best Agents combine multiple patterns

Next episode: Multi-Agent Systems — when multiple Agents work together as a team. One of the most exciting topics in AI Agent development!