Architecture

Testbase is built around simplicity and flexibility. Two agent types, two runtimes, and unified MCP configuration give you everything you need to build sophisticated multi-agent systems without complexity.

Core principles

Simplicity - Two agent types ('llm' and 'computer'), clear separation of concerns
Runtime abstraction - Switch between local and cloud execution without changing code
Manual composition - You control workflow orchestration (no magic)
Session continuity - Automatic thread management across multiple tasks
Professional tooling - Production-ready billing, authentication, and monitoring

High-level architecture


┌──────────────────────────────────────────────────────────┐
│                  Your Application                         │
│           (Multi-agent workflow orchestration)            │
└──────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────┐
│               computer-agents SDK                        │
│   • Agent class (unified interface)                       │
│   • run() function (execution loop)                       │
│   • Runtime abstraction                                   │
│   • MCP server integration                                │
│   • Session management                                    │
└──────────────────────────────────────────────────────────┘
                │                           │
                ▼                           ▼
     ┌─────────────────────┐    ┌─────────────────────────┐
     │   LocalRuntime      │    │   CloudRuntime          │
     │   (Your Machine)    │    │   (GCE VM)              │
     ├─────────────────────┤    ├─────────────────────────┤
     │ • Codex SDK         │    │ • Workspace upload      │
     │ • Local workspace   │    │ • API authentication    │
     │ • Direct execution  │    │ • GCE execution         │
     │ • Fast iteration    │    │ • Workspace download    │
     │ • Free             │    │ • Usage billing         │
     └─────────────────────┘    └─────────────────────────┘
                                            │
                                            ▼
                               ┌─────────────────────────────┐
                               │  Testbase Cloud (GCE VM)     │
                               │  • Express API server        │
                               │  • Codex SDK execution       │
                               │  • GCS workspace sync        │
                               │  • API key auth              │
                               │  • Usage tracking & billing  │
                               │  • SQLite database           │
                               └─────────────────────────────┘
                                            │
                                            ▼
                               ┌─────────────────────────────┐
                               │  Google Cloud Storage        │
                               │  • Workspace persistence     │
                               │  • Session metadata          │
                               │  • Thread cache              │
                               └─────────────────────────────┘

Agent types

Testbase has exactly two agent types:

LLM Agents (`agentType: 'llm'`)

Purpose: Reasoning, planning, analysis, review

Execution: OpenAI API (direct API calls)

Capabilities:

Chat and reasoning
Tool/function calling
JSON schema output
Handoffs to other agents

Requirements:

Model name (e.g., 'gpt-4o')
OpenAI API key

Use cases:

Creating implementation plans
Reviewing code for quality
Analyzing requirements
Decision-making

Example:


const planner = new Agent({
  agentType: 'llm',
  model: 'gpt-4o',
  instructions: 'Create detailed implementation plans.'
});

Computer Agents (`agentType: 'computer'`)

Purpose: Code execution, file operations, terminal commands

Execution: Codex SDK (computer-use agent)

Capabilities:

Read and write files
Run shell commands
Execute code
Install packages
Run tests

Requirements:

Runtime (LocalRuntime or CloudRuntime)
Workspace (git repository)
OpenAI API key

Use cases:

Writing code
Running tests
Modifying files
Building projects
Deploying applications

Example:


const executor = new Agent({
  agentType: 'computer',
  runtime: new LocalRuntime(),
  workspace: './project',
  instructions: 'Execute code changes as requested.'
});

Runtimes

Runtimes determine where and how computer agents execute.

LocalRuntime

Execution environment: Your local machine

How it works:

Codex SDK runs locally as a subprocess
Operates directly on your workspace files
Fast execution (no network overhead)
Free (no cloud costs)

When to use:

Development and testing
Fast iteration cycles
Small to medium projects
No isolation requirements

Configuration:


const runtime = new LocalRuntime({
  debug: true  // Show execution details
});

CloudRuntime

Execution environment: GCE VM with GCS persistence

How it works:

Upload workspace to GCS
POST task to GCE VM API
Codex SDK executes on VM
Download updated workspace from GCS

When to use:

Production workloads
Isolated execution
Large repositories
Usage tracking/billing required
Team collaboration

Configuration:


const runtime = new CloudRuntime({
  apiKey: process.env.TESTBASE_API_KEY,
  debug: true  // Show execution details
});

Performance characteristics:

Small workspaces (< 10 files): ~2-4s overhead
Medium workspaces (50-100 files): ~10-20s overhead
Large workspaces (500+ files): ~40-60s overhead

See Cloud Platform for detailed documentation.

Session management

Testbase automatically manages session continuity across multiple run() calls.

How it works

First run: Creates new Codex SDK thread
Subsequent runs: Reuses same thread automatically
Thread ID: Available as agent.currentThreadId
Reset: Call agent.resetSession() to start fresh


const agent = new Agent({
  agentType: 'computer',
  runtime: new LocalRuntime(),
  workspace: './project'
});
 
// First run - new session
await run(agent, 'Create app.py');
console.log(agent.currentThreadId);  // "thread-abc-123"
 
// Second run - continues session
await run(agent, 'Add error handling');
console.log(agent.currentThreadId);  // "thread-abc-123" (same!)
 
// Reset to start new session
agent.resetSession();
await run(agent, 'New project');
console.log(agent.currentThreadId);  // "thread-xyz-789" (new!)

Benefits

No manual management: Sessions just work
Context preservation: Agent remembers previous conversation
Incremental development: Build up complex implementations step-by-step
Natural workflows: Matches how humans work (“now add…”, “fix that…”)

MCP server integration

Testbase uses unified MCP configuration that works for both agent types.

Configuration format


import type { McpServerConfig } from 'computer-agents';
 
const mcpServers: McpServerConfig[] = [
  // Local stdio server
  {
    type: 'stdio',
    name: 'filesystem',
    command: 'npx',
    args: ['@modelcontextprotocol/server-filesystem', '/workspace']
  },
  // Remote HTTP server
  {
    type: 'http',
    name: 'notion',
    url: 'https://notion-mcp.example.com/mcp',
    bearerToken: process.env.NOTION_TOKEN
  }
];
 
// Works for both agent types!
const llmAgent = new Agent({
  agentType: 'llm',
  mcpServers  // Auto-converted to function tools
});
 
const computerAgent = new Agent({
  agentType: 'computer',
  runtime: new LocalRuntime(),
  mcpServers  // Passed to Codex SDK
});

How MCP servers are used

LLM agents:

MCP servers converted to OpenAI function tools
Agent can call tools during execution
Results returned to LLM for processing

Computer agents:

MCP servers passed directly to Codex SDK
Codex manages tool selection and execution
Integrated with computer-use capabilities

See MCP Integration for detailed documentation.

Multi-agent workflows

Testbase supports manual workflow composition - you control how agents interact.

Pattern: Plan → Execute → Review


const planner = new Agent({ agentType: 'llm', model: 'gpt-4o' });
const executor = new Agent({ agentType: 'computer', runtime: new LocalRuntime() });
const reviewer = new Agent({ agentType: 'llm', model: 'gpt-4o' });
 
const task = 'Add user authentication';
const plan = await run(planner, `Plan: ${task}`);
const implementation = await run(executor, plan.finalOutput);
const review = await run(reviewer, `Review: ${implementation.finalOutput}`);

Pattern: Iterative refinement


let approved = false;
while (!approved) {
  const implementation = await run(executor, task);
  const review = await run(reviewer, implementation.finalOutput);
 
  if (review.finalOutput.includes('APPROVED')) {
    approved = true;
  } else {
    task = `Fix these issues: ${review.finalOutput}`;
  }
}

Pattern: Agent handoffs

Use the OpenAI Agents SDK handoff system:


const planner = new Agent({
  agentType: 'llm',
  handoffs: [executor],  // Can handoff to executor
  instructions: 'Create plans and handoff to executor.'
});
 
const executor = new Agent({
  agentType: 'computer',
  handoffs: [reviewer],  // Can handoff to reviewer
  instructions: 'Execute plans and handoff to reviewer.'
});
 
// Start with planner, it manages handoffs
const result = await run(planner, 'Build a calculator');

Cloud infrastructure

The cloud platform provides isolated execution with billing and monitoring.

Components

GCE VM (34.170.205.13:8080):

Express API server
Codex SDK execution
API key authentication
SQLite database

GCS Bucket (gs://testbase-workspaces):

Workspace persistence
Session metadata
Thread cache

API Keys:

Standard keys: Pay-per-token billing
Internal keys: Unlimited usage (testing)

Billing System:

Token usage tracking
Credit-based payments
Daily/monthly spending limits
Real-time balance checks

See Cloud Platform for complete documentation.

Design decisions

Why only two agent types?

Simplicity over flexibility. Specialized roles (planner, reviewer) are workflow patterns, not fundamental types. You compose them manually rather than having built-in orchestration.

Why runtime abstraction?

Portability. Same agent code runs locally or in cloud - only config changes. Develop locally, deploy to cloud seamlessly.

Why manual composition?

Transparency. No hidden orchestration logic. You see and control exactly how agents interact.

Why Codex SDK?

Performance and reliability. Direct SDK integration instead of CLI spawning. Clean async/await API with built-in session management.

Next steps

Agents SDK - Detailed agent configuration and patterns
Cloud Platform - Production deployment and billing
Workflows - Advanced workflow patterns
MCP Integration - Connect external tools

Understanding this architecture will help you build sophisticated agent systems with confidence.