Final Project — Building a Research Agent from Scratch

Episode 10 25 min

Introduction: Time to Build a Real, Complete Agent

Welcome to the final episode of the series. By now you have learned all the concepts — from Agent fundamentals to tools, memory, planning, Multi-Agent, frameworks, security, and testing. Now it is time to put everything together and build a real project.

We are going to build a Research Agent — an Agent that can search the web, collect information, summarize it, and write a complete report. From scratch to finish.

Overall Architecture

Our Research Agent consists of 4 main components:

  • Planner: Breaks the research task into steps
  • Searcher: Searches the web and returns results
  • Analyzer: Reads texts and extracts important information
  • Writer: Writes the final report

These can be 4 separate Agents (Multi-Agent) or 4 stages in a single Agent. We will go with the single Agent, multi-stage approach — it is simpler and better for learning.

Step 1: Tools

import httpx
import json
from openai import OpenAI
from datetime import datetime

client = OpenAI()

# Tool 1: Web search
async def search_web(query: str, num_results: int = 5) -> str:
    """Search the web using Tavily API."""
    # Sign up at tavily.com - free plan has 1000 searches per month

    async with httpx.AsyncClient() as http:
        resp = await http.post(
            "https://api.tavily.com/search",
            json={
                "api_key": TAVILY_API_KEY,
                "query": query,
                "max_results": num_results,
                "include_raw_content": False,
                "search_depth": "advanced",
            },
            timeout=30,
        )
        data = resp.json()

    results = []
    for r in data.get("results", []):
        results.append({
            "title": r["title"],
            "url": r["url"],
            "content": r["content"][:500],
        })

    return json.dumps(results, ensure_ascii=False, indent=2)


# Tool 2: Read webpage content
async def read_webpage(url: str) -> str:
    """Reads the content of a web page."""
    async with httpx.AsyncClient() as http:
        try:
            resp = await http.get(
                url,
                follow_redirects=True,
                timeout=15,
                headers={"User-Agent": "Mozilla/5.0 Research-Agent"}
            )

            from html.parser import HTMLParser

            class TextExtractor(HTMLParser):
                def __init__(self):
                    super().__init__()
                    self.text = []
                    self.skip = False

                def handle_starttag(self, tag, attrs):
                    if tag in ["script", "style", "nav"]:
                        self.skip = True

                def handle_endtag(self, tag):
                    if tag in ["script", "style", "nav"]:
                        self.skip = False

                def handle_data(self, data):
                    if not self.skip:
                        text = data.strip()
                        if text:
                            self.text.append(text)

            extractor = TextExtractor()
            extractor.feed(resp.text)
            full_text = "\n".join(extractor.text)

            return full_text[:3000]

        except Exception as e:
            return f"Error reading page: {str(e)}"


# Tool 3: Save notes
notes = []

def save_note(content: str, source: str = "") -> str:
    """Saves a note."""
    note = {
        "id": len(notes) + 1,
        "content": content,
        "source": source,
        "timestamp": datetime.now().isoformat(),
    }
    notes.append(note)
    return f"Note #{note['id']} saved."


def get_all_notes() -> str:
    """Returns all notes."""
    if not notes:
        return "You have no notes yet."
    return json.dumps(notes, ensure_ascii=False, indent=2)


# Tool definitions for OpenAI
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the internet. Use to find new information.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query (English works best)"
                    },
                    "num_results": {
                        "type": "integer",
                        "description": "Number of results",
                        "default": 5
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_webpage",
            "description": "Read the content of a web page",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "Web page URL"
                    }
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "save_note",
            "description": "Save an important note. Write key information here.",
            "parameters": {
                "type": "object",
                "properties": {
                    "content": {
                        "type": "string",
                        "description": "Note content"
                    },
                    "source": {
                        "type": "string",
                        "description": "Information source (URL or title)"
                    }
                },
                "required": ["content"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_all_notes",
            "description": "See all notes you have saved so far",
            "parameters": {
                "type": "object",
                "properties": {}
            }
        }
    },
]

# Function map
TOOL_MAP = {
    "search_web": search_web,
    "read_webpage": read_webpage,
    "save_note": save_note,
    "get_all_notes": get_all_notes,
}

Step 2: Agent Engine with ReAct Loop

import asyncio

SYSTEM_PROMPT = """You are a Research Agent.
Your job is to research various topics.

Your method:
1. First write a research plan (what questions need answering)
2. Search and collect information
3. Read important pages
4. Save key points in notes
5. When you have enough information, write the final report

Rules:
- Check at least 3 different sources
- Always save key information in notes
- The final report must be structured (title, introduction, sections, conclusion)
- Cite your sources
- When research is complete and the report is ready, start your message with "Final Report:"
"""

MAX_ITERATIONS = 15


async def run_agent(research_topic: str) -> str:
    """Runs the Agent until research is complete."""

    global notes
    notes = []  # Reset notes

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content":
         f"Research this topic and write a report:\n{research_topic}"},
    ]

    for iteration in range(MAX_ITERATIONS):
        print(f"\n--- Iteration {iteration + 1} ---")

        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOLS,
            temperature=0.3,
        )

        msg = response.choices[0].message

        if msg.tool_calls:
            messages.append({
                "role": "assistant",
                "content": msg.content,
                "tool_calls": [
                    {
                        "id": tc.id,
                        "type": "function",
                        "function": {
                            "name": tc.function.name,
                            "arguments": tc.function.arguments,
                        }
                    }
                    for tc in msg.tool_calls
                ]
            })

            for tool_call in msg.tool_calls:
                func_name = tool_call.function.name
                func_args = json.loads(tool_call.function.arguments)

                print(f"  Tool: {func_name}({func_args})")

                func = TOOL_MAP[func_name]
                if asyncio.iscoroutinefunction(func):
                    result = await func(**func_args)
                else:
                    result = func(**func_args)

                print(f"  Result: {str(result)[:100]}...")

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": str(result),
                })
        else:
            content = msg.content or ""
            messages.append({"role": "assistant", "content": content})

            if "Final Report" in content:
                print("\nResearch complete!")
                return content

    return "Research reached maximum iteration count. Last status:\n" + messages[-1].get("content", "")

Step 3: Structured Report Generation

async def generate_report(topic: str, notes: list) -> str:
    """Generates a professional report from notes."""

    notes_text = "\n".join(
        f"- [{n['source']}] {n['content']}"
        for n in notes
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """
            You are a professional report writer.
            From research notes, write a complete
            and structured report.

            Report structure:
            1. Title
            2. Executive Summary (3-4 sentences)
            3. Introduction
            4. Key Findings (multiple sections)
            5. Analysis and Discussion
            6. Conclusion
            7. Sources

            Write in a professional but readable tone.
            """},
            {"role": "user", "content":
             f"Topic: {topic}\n\nNotes:\n{notes_text}"},
        ],
        temperature=0.4,
        max_tokens=3000,
    )

    return response.choices[0].message.content

Step 4: Full Execution

async def research(topic: str):
    """Complete research process."""

    print(f"Starting research: {topic}")
    print("=" * 60)

    report = await run_agent(topic)

    filename = f"report_{datetime.now():%Y%m%d_%H%M}.md"
    with open(filename, "w") as f:
        f.write(report)

    print(f"\nReport saved: {filename}")
    print(f"Number of notes: {len(notes)}")

    return report

# Run
# asyncio.run(research("The impact of AI on the job market in 2025"))

Step 5: Adding Guardrails

class ResearchGuardrails:
    """Guards specific to the Research Agent."""

    MAX_SEARCHES = 10
    MAX_PAGE_READS = 8
    MAX_COST = 0.50  # dollars
    BLOCKED_DOMAINS = [
        "facebook.com", "instagram.com",
        "tiktok.com",  # Social media are not reliable sources
    ]

    def __init__(self):
        self.search_count = 0
        self.read_count = 0
        self.estimated_cost = 0.0

    def check_search(self, query: str) -> tuple:
        if self.search_count >= self.MAX_SEARCHES:
            return False, "Maximum searches reached."
        self.search_count += 1
        return True, ""

    def check_read(self, url: str) -> tuple:
        if self.read_count >= self.MAX_PAGE_READS:
            return False, "Maximum page reads reached."

        for domain in self.BLOCKED_DOMAINS:
            if domain in url:
                return False, f"Domain {domain} is not allowed."

        self.read_count += 1
        return True, ""

    def check_cost(self, tokens: int) -> tuple:
        cost = tokens * 0.005 / 1000
        self.estimated_cost += cost

        if self.estimated_cost >= self.MAX_COST:
            return False, "Maximum budget reached."
        return True, ""

Advanced Improvements

This is the base version. A few ideas to make it better:

1. Long-term memory: Store previous research results in a Vector Database. Next time you research a similar topic, the Agent can also use past results.

2. Multi-Agent: Instead of one Agent, build 3 separate Agents — Searcher, Analyzer, Writer — and put an Orchestrator in charge of coordinating them.

3. More sources: Beyond the web, also search news APIs, Wikipedia, and academic databases (like arXiv).

4. Output format: Convert the report to PDF, HTML, or even slides.

5. Automated evaluation: After writing the report, have another Agent review its quality and rewrite if it is weak.

Series Summary — From Episode 1 to 10

Let me give a summary of the entire series:

Episode 1: What is an Agent and how it differs from a Chatbot — we learned that an Agent decides and acts.

Episode 2: Tool Use — we gave it hands and feet.

Episode 3: Memory — we gave it a brain.

Episode 4: Planning — we gave it the power to think.

Episode 5: Multi-Agent — we built a team of Agents.

Episode 6: Frameworks — we explored ready-made tools.

Episode 7: Security — we made the Agent safe.

Episode 8: Telegram bot — we built a real Agent.

Episode 9: Testing and debugging — we learned to make sure it works.

Episode 10 (this one): Research Agent — a complete project from scratch.

Next step: The best way to learn is by building. Pick a personal project and build an Agent. It could be a personal assistant, a data analysis tool, a Telegram bot, or anything else. The important thing is to start.

Thank you for following along through this series. I hope it has helped you deeply understand Agent concepts and implement them in practice. If you have questions, leave them below this episode.