Skip to content

smahmudrahat/web_application

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Agent Evaluation Framework

A modular framework for implementing and benchmarking different web agents with a standardized interface.

Overview

This framework provides a standardized way to implement and evaluate different web agents (e.g., from various LLM providers like OpenAI, Anthropic, etc.) with a consistent interface. The project separates agent implementations from benchmarking logic, making it easy to add new agents and run comparative evaluations.

Project Structure

src/
├── agents/                 # Agent implementations
│   ├── interface.py        # Core interface and data models
│   ├── openai/             # OpenAI agent implementation
│   ├── anthropic/          # Anthropic agent implementation (template)
│   └── browser_use/        # Browser-based agent implementation (template)
├── benchmark/              # Benchmarking tools
│   └── test.py             # Example test script
├── utils/                  # Utility functions

Key Components

Agent Interface

All agent implementations must adhere to the standardized interface defined in src/agents/interface.py. This ensures consistent input/output formatting across different implementations:

  • AgentTaskExecutionInput: Standardized input format with task description and optional parameters
  • AgentStep: Format for recording intermediate steps during task execution
  • AgentTaskExecutionResult: Standardized output format for agent responses
  • AgentInterface: Abstract base class that all agent implementations must extend

Sample Implementation

The project includes a sample OpenAI agent implementation (src/agents/openai/openai_agent.py) that demonstrates how to implement the interface. This implementation:

  • Uses a Playwright-based browser controller
  • Handles task execution, timing, and result formatting
  • Follows the standardized input/output interface

Installation

This project uses Poetry for dependency management.

Mac Installation

# Install Poetry
brew install poetry

# Configure Poetry to create virtual environments in the project directory
poetry config virtualenvs.in-project true --local

# Install dependencies
poetry install

# Run a script
poetry run python src/benchmark/test.py

Adding Dependencies

To add new dependencies:

poetry add <package-name>

Usage

Here's a simple example of how to use an agent:

from src.agents import OpenAI_Agent
from src.agents.interface import AgentTaskExecutionInput

# Create an agent
agent = OpenAI_Agent()

# Define a task
input = AgentTaskExecutionInput(
    task="Find out how the pydantic model can be converted into a json schema by going to the pydantic website and finding the exact latest documentation.",
    debug=True
)

# Execute the task
result = agent.execute_task(input=input)

# Print the result
print(result)

Extending with New Agents

To add a new agent implementation:

  1. Create a new directory under src/agents/ for your implementation (e.g., src/agents/my_agent/)

  2. Implement the AgentInterface class:

    from src.agents.interface import AgentInterface, AgentTaskExecutionResult, AgentTaskExecutionInput
    
    class MyAgent(AgentInterface):
        # Optional init method its fine to not implmeent it if you dont need it.
        def __init__(self):
            # Initialize your agent
            pass
            
        def execute_task(self, input: AgentTaskExecutionInput) -> AgentTaskExecutionResult:
            # Implement task execution logic
            # ...
            
            # Return result in standardized format
            return AgentTaskExecutionResult(
                task=input.task,
                answer="Your agent's answer",
                final_url="https://final-url.com",
                execution_time_seconds=execution_time
            )
  3. Add your agent to the src/agents/__init__.py file:

    from .interface import AgentTaskExecutionResult, AgentInterface
    from .openai.openai_agent import OpenAI_Agent
    from .my_agent.my_agent import MyAgent
  4. Create test cases in the benchmark directory to evaluate your agent

Contributing

Please ensure that any new agent implementations follow the standardized interface defined in src/agents/interface.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors