Skip to Content
FoundationsArchitecture

Architecture

Testbase is built around simplicity and flexibility. Two agent types, two runtimes, and unified MCP configuration give you everything you need to build sophisticated multi-agent systems without complexity.

Core principles

  1. Simplicity - Two agent types ('llm' and 'computer'), clear separation of concerns
  2. Runtime abstraction - Switch between local and cloud execution without changing code
  3. Manual composition - You control workflow orchestration (no magic)
  4. Session continuity - Automatic thread management across multiple tasks
  5. Professional tooling - Production-ready billing, authentication, and monitoring

High-level architecture

┌──────────────────────────────────────────────────────────┐ │ Your Application │ │ (Multi-agent workflow orchestration) │ └──────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────┐ │ computer-agents SDK │ │ • Agent class (unified interface) │ │ • run() function (execution loop) │ │ • Runtime abstraction │ │ • MCP server integration │ │ • Session management │ └──────────────────────────────────────────────────────────┘ │ │ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────────┐ │ LocalRuntime │ │ CloudRuntime │ │ (Your Machine) │ │ (GCE VM) │ ├─────────────────────┤ ├─────────────────────────┤ │ • Codex SDK │ │ • Workspace upload │ │ • Local workspace │ │ • API authentication │ │ • Direct execution │ │ • GCE execution │ │ • Fast iteration │ │ • Workspace download │ │ • Free │ │ • Usage billing │ └─────────────────────┘ └─────────────────────────┘ ┌─────────────────────────────┐ │ Testbase Cloud (GCE VM) │ │ • Express API server │ │ • Codex SDK execution │ │ • GCS workspace sync │ │ • API key auth │ │ • Usage tracking & billing │ │ • SQLite database │ └─────────────────────────────┘ ┌─────────────────────────────┐ │ Google Cloud Storage │ │ • Workspace persistence │ │ • Session metadata │ │ • Thread cache │ └─────────────────────────────┘

Agent types

Testbase has exactly two agent types:

LLM Agents (agentType: 'llm')

Purpose: Reasoning, planning, analysis, review

Execution: OpenAI API (direct API calls)

Capabilities:

  • Chat and reasoning
  • Tool/function calling
  • JSON schema output
  • Handoffs to other agents

Requirements:

  • Model name (e.g., 'gpt-4o')
  • OpenAI API key

Use cases:

  • Creating implementation plans
  • Reviewing code for quality
  • Analyzing requirements
  • Decision-making

Example:

const planner = new Agent({ agentType: 'llm', model: 'gpt-4o', instructions: 'Create detailed implementation plans.' });

Computer Agents (agentType: 'computer')

Purpose: Code execution, file operations, terminal commands

Execution: Codex SDK (computer-use agent)

Capabilities:

  • Read and write files
  • Run shell commands
  • Execute code
  • Install packages
  • Run tests

Requirements:

  • Runtime (LocalRuntime or CloudRuntime)
  • Workspace (git repository)
  • OpenAI API key

Use cases:

  • Writing code
  • Running tests
  • Modifying files
  • Building projects
  • Deploying applications

Example:

const executor = new Agent({ agentType: 'computer', runtime: new LocalRuntime(), workspace: './project', instructions: 'Execute code changes as requested.' });

Runtimes

Runtimes determine where and how computer agents execute.

LocalRuntime

Execution environment: Your local machine

How it works:

  1. Codex SDK runs locally as a subprocess
  2. Operates directly on your workspace files
  3. Fast execution (no network overhead)
  4. Free (no cloud costs)

When to use:

  • Development and testing
  • Fast iteration cycles
  • Small to medium projects
  • No isolation requirements

Configuration:

const runtime = new LocalRuntime({ debug: true // Show execution details });

CloudRuntime

Execution environment: GCE VM with GCS persistence

How it works:

  1. Upload workspace to GCS
  2. POST task to GCE VM API
  3. Codex SDK executes on VM
  4. Download updated workspace from GCS

When to use:

  • Production workloads
  • Isolated execution
  • Large repositories
  • Usage tracking/billing required
  • Team collaboration

Configuration:

const runtime = new CloudRuntime({ apiKey: process.env.TESTBASE_API_KEY, debug: true // Show execution details });

Performance characteristics:

  • Small workspaces (< 10 files): ~2-4s overhead
  • Medium workspaces (50-100 files): ~10-20s overhead
  • Large workspaces (500+ files): ~40-60s overhead

See Cloud Platform for detailed documentation.

Session management

Testbase automatically manages session continuity across multiple run() calls.

How it works

  1. First run: Creates new Codex SDK thread
  2. Subsequent runs: Reuses same thread automatically
  3. Thread ID: Available as agent.currentThreadId
  4. Reset: Call agent.resetSession() to start fresh
const agent = new Agent({ agentType: 'computer', runtime: new LocalRuntime(), workspace: './project' }); // First run - new session await run(agent, 'Create app.py'); console.log(agent.currentThreadId); // "thread-abc-123" // Second run - continues session await run(agent, 'Add error handling'); console.log(agent.currentThreadId); // "thread-abc-123" (same!) // Reset to start new session agent.resetSession(); await run(agent, 'New project'); console.log(agent.currentThreadId); // "thread-xyz-789" (new!)

Benefits

  • No manual management: Sessions just work
  • Context preservation: Agent remembers previous conversation
  • Incremental development: Build up complex implementations step-by-step
  • Natural workflows: Matches how humans work (“now add…”, “fix that…”)

MCP server integration

Testbase uses unified MCP configuration that works for both agent types.

Configuration format

import type { McpServerConfig } from 'computer-agents'; const mcpServers: McpServerConfig[] = [ // Local stdio server { type: 'stdio', name: 'filesystem', command: 'npx', args: ['@modelcontextprotocol/server-filesystem', '/workspace'] }, // Remote HTTP server { type: 'http', name: 'notion', url: 'https://notion-mcp.example.com/mcp', bearerToken: process.env.NOTION_TOKEN } ]; // Works for both agent types! const llmAgent = new Agent({ agentType: 'llm', mcpServers // Auto-converted to function tools }); const computerAgent = new Agent({ agentType: 'computer', runtime: new LocalRuntime(), mcpServers // Passed to Codex SDK });

How MCP servers are used

LLM agents:

  • MCP servers converted to OpenAI function tools
  • Agent can call tools during execution
  • Results returned to LLM for processing

Computer agents:

  • MCP servers passed directly to Codex SDK
  • Codex manages tool selection and execution
  • Integrated with computer-use capabilities

See MCP Integration for detailed documentation.

Multi-agent workflows

Testbase supports manual workflow composition - you control how agents interact.

Pattern: Plan → Execute → Review

const planner = new Agent({ agentType: 'llm', model: 'gpt-4o' }); const executor = new Agent({ agentType: 'computer', runtime: new LocalRuntime() }); const reviewer = new Agent({ agentType: 'llm', model: 'gpt-4o' }); const task = 'Add user authentication'; const plan = await run(planner, `Plan: ${task}`); const implementation = await run(executor, plan.finalOutput); const review = await run(reviewer, `Review: ${implementation.finalOutput}`);

Pattern: Iterative refinement

let approved = false; while (!approved) { const implementation = await run(executor, task); const review = await run(reviewer, implementation.finalOutput); if (review.finalOutput.includes('APPROVED')) { approved = true; } else { task = `Fix these issues: ${review.finalOutput}`; } }

Pattern: Agent handoffs

Use the OpenAI Agents SDK handoff system:

const planner = new Agent({ agentType: 'llm', handoffs: [executor], // Can handoff to executor instructions: 'Create plans and handoff to executor.' }); const executor = new Agent({ agentType: 'computer', handoffs: [reviewer], // Can handoff to reviewer instructions: 'Execute plans and handoff to reviewer.' }); // Start with planner, it manages handoffs const result = await run(planner, 'Build a calculator');

Cloud infrastructure

The cloud platform provides isolated execution with billing and monitoring.

Components

GCE VM (34.170.205.13:8080):

  • Express API server
  • Codex SDK execution
  • API key authentication
  • SQLite database

GCS Bucket (gs://testbase-workspaces):

  • Workspace persistence
  • Session metadata
  • Thread cache

API Keys:

  • Standard keys: Pay-per-token billing
  • Internal keys: Unlimited usage (testing)

Billing System:

  • Token usage tracking
  • Credit-based payments
  • Daily/monthly spending limits
  • Real-time balance checks

See Cloud Platform for complete documentation.

Design decisions

Why only two agent types?

Simplicity over flexibility. Specialized roles (planner, reviewer) are workflow patterns, not fundamental types. You compose them manually rather than having built-in orchestration.

Why runtime abstraction?

Portability. Same agent code runs locally or in cloud - only config changes. Develop locally, deploy to cloud seamlessly.

Why manual composition?

Transparency. No hidden orchestration logic. You see and control exactly how agents interact.

Why Codex SDK?

Performance and reliability. Direct SDK integration instead of CLI spawning. Clean async/await API with built-in session management.

Next steps

Understanding this architecture will help you build sophisticated agent systems with confidence.

Last updated on