Beyond Chatbots: I Built an AI Agent System That Plans, Acts, and Improves Itself

Beyond Chatbots: I Built an AI Agent System That Plans, Acts, and Improves Itself

Ahmad Tariq

How I moved from simple prompts to an AI workflow that understands goals, uses tools, checks results, and learns from mistakes

Most AI apps today are still just chat boxes wearing expensive clothes.

You type something. The model replies. Everyone says, “Wow.” Then you ask it to actually complete a messy real-world task, and suddenly the magic starts leaking.

Because answering is not the same as doing.

A chatbot can explain how to analyze sales data. An AI agent should open the file, inspect the columns, clean the data, run the analysis, generate the report, check if the result makes sense, and tell you what changed.

That is the difference.

So instead of building another chatbot, I started building an AI agent system.

Not a toy agent that says “I will do this” and then hallucinates half the plan.

I mean a system that can:

Understand a goal Break it into steps Choose tools Execute actions Observe results Retry when something fails Reflect on its own output Produce a final answer with evidence

That is where AI gets interesting.

1. The real upgrade: from response generation to task execution

A chatbot is usually built like this:

"""
Simple Chatbot FlowUser message
    -> LLM
    -> Response
"""

That works for conversation.

But real work needs a different structure:

"""
AI Agent FlowUser goal
    -> Understand intent
    -> Create plan
    -> Select tools
    -> Execute step
    -> Observe result
    -> Update plan
    -> Continue or stop
    -> Final output
"""

The biggest change is this:

The model is no longer just writing text. It is controlling a workflow.

That workflow needs memory, tools, validation, logging, and stopping rules.

Because an agent without stopping rules is not an assistant. It is a confused intern with unlimited coffee.

2. Designing the agent state

Before building the agent, I need one clean object that tracks everything.

What is the goal? What steps are planned? Which steps are done? Which tools were used? What failed? What should happen next?

from dataclasses import dataclass, field
from typing import List, Dict, Any, Optional
from enum import Enum
from datetime import datetime
from uuid import uuid4

class StepStatus(str, Enum):
    PENDING = "pending"
    RUNNING = "running"
    SUCCESS = "success"
    FAILED = "failed"
    SKIPPED = "skipped"

class AgentStatus(str, Enum):
    PLANNING = "planning"
    EXECUTING = "executing"
    REFLECTING = "reflecting"
    COMPLETED = "completed"
    FAILED = "failed"

@dataclass
class AgentStep:
    step_id: str
    description: str
    tool_name: Optional[str] = None
    tool_input: Dict[str, Any] = field(default_factory=dict)
    status: StepStatus = StepStatus.PENDING
    result: Optional[Dict[str, Any]] = None
    error: Optional[str] = None

@dataclass
class AgentState:
    run_id: str
    user_goal: str
    status: AgentStatus
    plan: List[AgentStep] = field(default_factory=list)
    observations: List[str] = field(default_factory=list)
    final_output: Optional[str] = None
    created_at: datetime = field(default_factory=datetime.utcnow)
    max_steps: int = 8
    current_step_index: int = 0

def create_agent_state(user_goal: str) -> AgentState:
    return AgentState(
        run_id=str(uuid4()),
        user_goal=user_goal,
        status=AgentStatus.PLANNING
    )

state = create_agent_state(
    "Analyze sales data, identify trends, and generate a short business report."
)
print(state)

This might look boring, but it is important.

A serious AI agent needs state.

Without state, the agent forgets what it has done. With state, the agent can reason across multiple steps.

That is when it starts feeling less like a chatbot and more like a worker.

3. Giving the agent tools instead of just prompts

An AI agent becomes useful when it can use tools.

For this example, I will create three tools:

A data inspector A sales analyzer A report generator

from abc import ABC, abstractmethod
import pandas as pd
from pathlib import Path

class ToolResult:
    def __init__(self, success: bool, data: Dict[str, Any], error: Optional[str] = None):
        self.success = success
        self.data = data
        self.error = error

class Tool(ABC):
    name: str
    description: str
    @abstractmethod
    def run(self, tool_input: Dict[str, Any]) -> ToolResult:
        pass

class DataInspectorTool(Tool):
    name = "data_inspector"
    description = "Inspects a CSV file and returns columns, row count, and missing values."
    def run(self, tool_input: Dict[str, Any]) -> ToolResult:
        try:
            file_path = Path(tool_input["file_path"])
            df = pd.read_csv(file_path)
            return ToolResult(
                success=True,
                data={
                    "columns": list(df.columns),
                    "row_count": len(df),
                    "missing_values": df.isna().sum().to_dict(),
                    "sample_rows": df.head(3).to_dict(orient="records")
                }
            )
        except Exception as error:
            return ToolResult(
                success=False,
                data={},
                error=str(error)
            )

class SalesAnalyzerTool(Tool):
    name = "sales_analyzer"
    description = "Analyzes sales data and returns revenue trends and top products."
    def run(self, tool_input: Dict[str, Any]) -> ToolResult:
        try:
            file_path = Path(tool_input["file_path"])
            df = pd.read_csv(file_path)
            df["date"] = pd.to_datetime(df["date"], errors="coerce")
            df["revenue"] = pd.to_numeric(df["revenue"], errors="coerce").fillna(0)
            monthly_revenue = (
                df.groupby(df["date"].dt.to_period("M"))["revenue"]
                .sum()
                .reset_index()
            )
            monthly_revenue["date"] = monthly_revenue["date"].astype(str)
            top_products = (
                df.groupby("product")["revenue"]
                .sum()
                .sort_values(ascending=False)
                .head(5)
                .to_dict()
            )
            return ToolResult(
                success=True,
                data={
                    "total_revenue": float(df["revenue"].sum()),
                    "average_order_value": float(df["revenue"].mean()),
                    "monthly_revenue": monthly_revenue.to_dict(orient="records"),
                    "top_products": top_products
                }
            )
        except Exception as error:
            return ToolResult(
                success=False,
                data={},
                error=str(error)
            )

class ReportGeneratorTool(Tool):
    name = "report_generator"
    description = "Generates a simple business report from analysis results."
    def run(self, tool_input: Dict[str, Any]) -> ToolResult:
        try:
            analysis = tool_input["analysis"]
            report = f"""
Sales Performance Report
Total Revenue:
{analysis["total_revenue"]}
Average Order Value:
{analysis["average_order_value"]:.2f}
Top Products:
{analysis["top_products"]}
Monthly Revenue Trend:
{analysis["monthly_revenue"]}
Summary:
The sales data shows the strongest products and monthly revenue movement.
The business should focus on high-performing products and investigate months
where revenue dropped or stayed flat.
"""
            return ToolResult(
                success=True,
                data={
                    "report": report.strip()
                }
            )
        except Exception as error:
            return ToolResult(
                success=False,
                data={},
                error=str(error)
            )

Notice the important part:

Each tool has one job.

This keeps the agent clean. The agent thinks and coordinates. The tools do the actual work.

That separation matters.

If the model tries to do everything, the system becomes messy. If tools do specific tasks, the agent becomes more reliable.

4. Creating a tool registry

Now the agent needs a way to find tools by name.

class ToolRegistry:
    def __init__(self):
        self.tools: Dict[str, Tool] = {}
def register(self, tool: Tool):
        if tool.name in self.tools:
            raise ValueError(f"Tool already registered: {tool.name}")
        self.tools[tool.name] = tool
    def get(self, tool_name: str) -> Tool:
        if tool_name not in self.tools:
            raise ValueError(f"Tool not found: {tool_name}")
        return self.tools[tool_name]
    def list_tools(self):
        return [
            {
                "name": tool.name,
                "description": tool.description
            }
            for tool in self.tools.values()
        ]

registry = ToolRegistry()
registry.register(DataInspectorTool())
registry.register(SalesAnalyzerTool())
registry.register(ReportGeneratorTool())
print(registry.list_tools())

This is small, but powerful.

Now I can add new tools without rewriting the whole agent.

Need email sending? Add an email tool. Need PDF reading? Add a PDF tool. Need database access? Add a database tool. Need browser search? Add a browser tool.

The agent does not need to know every detail. It only needs to know which tool fits which step.

5. Building a planner that turns goals into steps

Now comes the interesting part.

The planner takes a messy human goal and turns it into an execution plan.

In production, I would use an LLM for this. But the structure can be shown clearly with code.

class AgentPlanner:
    def create_plan(self, user_goal: str, file_path: str) -> List[AgentStep]:
        goal_lower = user_goal.lower()
steps = []
        if "sales" in goal_lower or "revenue" in goal_lower:
            steps.append(
                AgentStep(
                    step_id=str(uuid4()),
                    description="Inspect the sales data file.",
                    tool_name="data_inspector",
                    tool_input={"file_path": file_path}
                )
            )
            steps.append(
                AgentStep(
                    step_id=str(uuid4()),
                    description="Analyze revenue trends and top products.",
                    tool_name="sales_analyzer",
                    tool_input={"file_path": file_path}
                )
            )
            steps.append(
                AgentStep(
                    step_id=str(uuid4()),
                    description="Generate a business report from the sales analysis.",
                    tool_name="report_generator",
                    tool_input={}
                )
            )
        else:
            steps.append(
                AgentStep(
                    step_id=str(uuid4()),
                    description="Inspect the provided data.",
                    tool_name="data_inspector",
                    tool_input={"file_path": file_path}
                )
            )
        return steps

This is the part most people skip.

They ask the model for the final answer immediately.

But a strong agent should first ask:

What steps are needed?

That is how you move from chat to execution.

6. Building the execution loop

The execution loop is the heart of the agent.

It runs each step, captures the result, stores observations, and passes useful data to the next step.

class AgentExecutor:
    def __init__(self, registry: ToolRegistry):
        self.registry = registry
def execute(self, state: AgentState) -> AgentState:
        state.status = AgentStatus.EXECUTING
        previous_results = {}
        for index, step in enumerate(state.plan):
            if index >= state.max_steps:
                state.status = AgentStatus.FAILED
                state.observations.append("Stopped because maximum step limit was reached.")
                return state
            state.current_step_index = index
            step.status = StepStatus.RUNNING
            try:
                tool = self.registry.get(step.tool_name)
                if step.tool_name == "report_generator":
                    step.tool_input["analysis"] = previous_results.get("sales_analyzer")
                result = tool.run(step.tool_input)
                if result.success:
                    step.status = StepStatus.SUCCESS
                    step.result = result.data
                    previous_results[step.tool_name] = result.data
                    state.observations.append(
                        f"Step succeeded: {step.description}"
                    )
                else:
                    step.status = StepStatus.FAILED
                    step.error = result.error
                    state.observations.append(
                        f"Step failed: {step.description}. Error: {result.error}"
                    )
                    state.status = AgentStatus.FAILED
                    return state
            except Exception as error:
                step.status = StepStatus.FAILED
                step.error = str(error)
                state.status = AgentStatus.FAILED
                state.observations.append(
                    f"Unexpected failure in step: {step.description}. Error: {error}"
                )
                return state
        state.status = AgentStatus.REFLECTING
        return state

This is where the agent becomes real.

It is not just saying:

Here is what I would do.

It is actually doing it.

And because every step is tracked, I can see exactly what happened.

No mystery. No drama. No “the AI just did something weird.”

7. Adding reflection so the agent can judge its own work

This is one of my favorite parts.

After execution, the agent should check whether the task was completed properly.

Did all steps succeed? Was the report generated? Were there errors? Is anything missing?

class AgentReflector:
    def reflect(self, state: AgentState) -> AgentState:
        failed_steps = [
            step for step in state.plan
            if step.status == StepStatus.FAILED
        ]
successful_steps = [
            step for step in state.plan
            if step.status == StepStatus.SUCCESS
        ]
        if failed_steps:
            state.status = AgentStatus.FAILED
            state.final_output = self.create_failure_summary(state, failed_steps)
            return state
        report_step = next(
            (step for step in state.plan if step.tool_name == "report_generator"),
            None
        )
        if report_step and report_step.result:
            state.status = AgentStatus.COMPLETED
            state.final_output = report_step.result["report"]
            return state
        state.status = AgentStatus.COMPLETED
        state.final_output = (
            f"Completed {len(successful_steps)} steps successfully, "
            "but no final report was generated."
        )
        return state
    def create_failure_summary(self, state: AgentState, failed_steps: List[AgentStep]) -> str:
        lines = [
            "The agent could not complete the task.",
            "",
            "Failed Steps:"
        ]
        for step in failed_steps:
            lines.append(f"- {step.description}")
            lines.append(f"  Error: {step.error}")
        lines.append("")
        lines.append("Suggestion:")
        lines.append("Check the input file, required columns, or tool configuration.")
        return "\n".join(lines)

Reflection is important because an agent should not blindly assume success.

A weak agent says:

Done.

A better agent says:

I completed the task, here is the output, and here is what I verified.

That second version is much more useful.

8. Running the full agent system

Now let’s connect everything.

class AIAgentSystem:
    def __init__(self, registry: ToolRegistry):
        self.registry = registry
        self.planner = AgentPlanner()
        self.executor = AgentExecutor(registry)
        self.reflector = AgentReflector()
def run(self, user_goal: str, file_path: str) -> AgentState:
        state = create_agent_state(user_goal)
        state.plan = self.planner.create_plan(
            user_goal=user_goal,
            file_path=file_path
        )
        state.observations.append(
            f"Created a plan with {len(state.plan)} steps."
        )
        state = self.executor.execute(state)
        if state.status == AgentStatus.REFLECTING:
            state = self.reflector.reflect(state)
        return state

agent_system = AIAgentSystem(registry)
final_state = agent_system.run(
    user_goal="Analyze sales data, identify trends, and generate a report.",
    file_path="sales_data.csv"
)
print("Status:", final_state.status)
print("\nObservations:")
for observation in final_state.observations:
    print("-", observation)
print("\nFinal Output:")
print(final_state.final_output)

This is the full loop:

Plan Act Observe Reflect Answer

That loop is the foundation of many advanced AI agent systems.

The magic is not just in the LLM. The magic is in the workflow around it.

9. Final thoughts

The most exciting AI systems are not the ones that only answer questions.

They are the ones that get things done.

A chatbot is useful when I need explanation. An AI agent is useful when I need execution.

The real future of AI is not:

Ask question → get answer.

It is:

Set goal → agent plans → agent acts → agent checks → agent improves → user gets result.

That is why I think agent systems are such a big deal.

They turn AI from a talking tool into a working tool.

Of course, this does not mean agents should run wild. A serious agent needs limits, logs, approvals, and human control for risky actions.

But when designed properly, AI agents can become powerful teammates.

Not because they replace thinking.

But because they remove repetitive work, organize messy tasks, and let people focus on decisions instead of button-clicking.

That is the version of AI I care about most.

Not the AI that only sounds smart.

The AI that actually helps.