Architecture
Testbase is built around simplicity and flexibility. Two agent types, two runtimes, and unified MCP configuration give you everything you need to build sophisticated multi-agent systems without complexity.
Core principles
- Simplicity - Two agent types (
'llm'and'computer'), clear separation of concerns - Runtime abstraction - Switch between local and cloud execution without changing code
- Manual composition - You control workflow orchestration (no magic)
- Session continuity - Automatic thread management across multiple tasks
- Professional tooling - Production-ready billing, authentication, and monitoring
High-level architecture
┌──────────────────────────────────────────────────────────┐
│ Your Application │
│ (Multi-agent workflow orchestration) │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ computer-agents SDK │
│ • Agent class (unified interface) │
│ • run() function (execution loop) │
│ • Runtime abstraction │
│ • MCP server integration │
│ • Session management │
└──────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────────┐
│ LocalRuntime │ │ CloudRuntime │
│ (Your Machine) │ │ (GCE VM) │
├─────────────────────┤ ├─────────────────────────┤
│ • Codex SDK │ │ • Workspace upload │
│ • Local workspace │ │ • API authentication │
│ • Direct execution │ │ • GCE execution │
│ • Fast iteration │ │ • Workspace download │
│ • Free │ │ • Usage billing │
└─────────────────────┘ └─────────────────────────┘
│
▼
┌─────────────────────────────┐
│ Testbase Cloud (GCE VM) │
│ • Express API server │
│ • Codex SDK execution │
│ • GCS workspace sync │
│ • API key auth │
│ • Usage tracking & billing │
│ • SQLite database │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ Google Cloud Storage │
│ • Workspace persistence │
│ • Session metadata │
│ • Thread cache │
└─────────────────────────────┘Agent types
Testbase has exactly two agent types:
LLM Agents (agentType: 'llm')
Purpose: Reasoning, planning, analysis, review
Execution: OpenAI API (direct API calls)
Capabilities:
- Chat and reasoning
- Tool/function calling
- JSON schema output
- Handoffs to other agents
Requirements:
- Model name (e.g.,
'gpt-4o') - OpenAI API key
Use cases:
- Creating implementation plans
- Reviewing code for quality
- Analyzing requirements
- Decision-making
Example:
const planner = new Agent({
agentType: 'llm',
model: 'gpt-4o',
instructions: 'Create detailed implementation plans.'
});Computer Agents (agentType: 'computer')
Purpose: Code execution, file operations, terminal commands
Execution: Codex SDK (computer-use agent)
Capabilities:
- Read and write files
- Run shell commands
- Execute code
- Install packages
- Run tests
Requirements:
- Runtime (LocalRuntime or CloudRuntime)
- Workspace (git repository)
- OpenAI API key
Use cases:
- Writing code
- Running tests
- Modifying files
- Building projects
- Deploying applications
Example:
const executor = new Agent({
agentType: 'computer',
runtime: new LocalRuntime(),
workspace: './project',
instructions: 'Execute code changes as requested.'
});Runtimes
Runtimes determine where and how computer agents execute.
LocalRuntime
Execution environment: Your local machine
How it works:
- Codex SDK runs locally as a subprocess
- Operates directly on your workspace files
- Fast execution (no network overhead)
- Free (no cloud costs)
When to use:
- Development and testing
- Fast iteration cycles
- Small to medium projects
- No isolation requirements
Configuration:
const runtime = new LocalRuntime({
debug: true // Show execution details
});CloudRuntime
Execution environment: GCE VM with GCS persistence
How it works:
- Upload workspace to GCS
- POST task to GCE VM API
- Codex SDK executes on VM
- Download updated workspace from GCS
When to use:
- Production workloads
- Isolated execution
- Large repositories
- Usage tracking/billing required
- Team collaboration
Configuration:
const runtime = new CloudRuntime({
apiKey: process.env.TESTBASE_API_KEY,
debug: true // Show execution details
});Performance characteristics:
- Small workspaces (< 10 files): ~2-4s overhead
- Medium workspaces (50-100 files): ~10-20s overhead
- Large workspaces (500+ files): ~40-60s overhead
See Cloud Platform for detailed documentation.
Session management
Testbase automatically manages session continuity across multiple run() calls.
How it works
- First run: Creates new Codex SDK thread
- Subsequent runs: Reuses same thread automatically
- Thread ID: Available as
agent.currentThreadId - Reset: Call
agent.resetSession()to start fresh
const agent = new Agent({
agentType: 'computer',
runtime: new LocalRuntime(),
workspace: './project'
});
// First run - new session
await run(agent, 'Create app.py');
console.log(agent.currentThreadId); // "thread-abc-123"
// Second run - continues session
await run(agent, 'Add error handling');
console.log(agent.currentThreadId); // "thread-abc-123" (same!)
// Reset to start new session
agent.resetSession();
await run(agent, 'New project');
console.log(agent.currentThreadId); // "thread-xyz-789" (new!)Benefits
- No manual management: Sessions just work
- Context preservation: Agent remembers previous conversation
- Incremental development: Build up complex implementations step-by-step
- Natural workflows: Matches how humans work (“now add…”, “fix that…”)
MCP server integration
Testbase uses unified MCP configuration that works for both agent types.
Configuration format
import type { McpServerConfig } from 'computer-agents';
const mcpServers: McpServerConfig[] = [
// Local stdio server
{
type: 'stdio',
name: 'filesystem',
command: 'npx',
args: ['@modelcontextprotocol/server-filesystem', '/workspace']
},
// Remote HTTP server
{
type: 'http',
name: 'notion',
url: 'https://notion-mcp.example.com/mcp',
bearerToken: process.env.NOTION_TOKEN
}
];
// Works for both agent types!
const llmAgent = new Agent({
agentType: 'llm',
mcpServers // Auto-converted to function tools
});
const computerAgent = new Agent({
agentType: 'computer',
runtime: new LocalRuntime(),
mcpServers // Passed to Codex SDK
});How MCP servers are used
LLM agents:
- MCP servers converted to OpenAI function tools
- Agent can call tools during execution
- Results returned to LLM for processing
Computer agents:
- MCP servers passed directly to Codex SDK
- Codex manages tool selection and execution
- Integrated with computer-use capabilities
See MCP Integration for detailed documentation.
Multi-agent workflows
Testbase supports manual workflow composition - you control how agents interact.
Pattern: Plan → Execute → Review
const planner = new Agent({ agentType: 'llm', model: 'gpt-4o' });
const executor = new Agent({ agentType: 'computer', runtime: new LocalRuntime() });
const reviewer = new Agent({ agentType: 'llm', model: 'gpt-4o' });
const task = 'Add user authentication';
const plan = await run(planner, `Plan: ${task}`);
const implementation = await run(executor, plan.finalOutput);
const review = await run(reviewer, `Review: ${implementation.finalOutput}`);Pattern: Iterative refinement
let approved = false;
while (!approved) {
const implementation = await run(executor, task);
const review = await run(reviewer, implementation.finalOutput);
if (review.finalOutput.includes('APPROVED')) {
approved = true;
} else {
task = `Fix these issues: ${review.finalOutput}`;
}
}Pattern: Agent handoffs
Use the OpenAI Agents SDK handoff system:
const planner = new Agent({
agentType: 'llm',
handoffs: [executor], // Can handoff to executor
instructions: 'Create plans and handoff to executor.'
});
const executor = new Agent({
agentType: 'computer',
handoffs: [reviewer], // Can handoff to reviewer
instructions: 'Execute plans and handoff to reviewer.'
});
// Start with planner, it manages handoffs
const result = await run(planner, 'Build a calculator');Cloud infrastructure
The cloud platform provides isolated execution with billing and monitoring.
Components
GCE VM (34.170.205.13:8080):
- Express API server
- Codex SDK execution
- API key authentication
- SQLite database
GCS Bucket (gs://testbase-workspaces):
- Workspace persistence
- Session metadata
- Thread cache
API Keys:
- Standard keys: Pay-per-token billing
- Internal keys: Unlimited usage (testing)
Billing System:
- Token usage tracking
- Credit-based payments
- Daily/monthly spending limits
- Real-time balance checks
See Cloud Platform for complete documentation.
Design decisions
Why only two agent types?
Simplicity over flexibility. Specialized roles (planner, reviewer) are workflow patterns, not fundamental types. You compose them manually rather than having built-in orchestration.
Why runtime abstraction?
Portability. Same agent code runs locally or in cloud - only config changes. Develop locally, deploy to cloud seamlessly.
Why manual composition?
Transparency. No hidden orchestration logic. You see and control exactly how agents interact.
Why Codex SDK?
Performance and reliability. Direct SDK integration instead of CLI spawning. Clean async/await API with built-in session management.
Next steps
- Agents SDK - Detailed agent configuration and patterns
- Cloud Platform - Production deployment and billing
- Workflows - Advanced workflow patterns
- MCP Integration - Connect external tools
Understanding this architecture will help you build sophisticated agent systems with confidence.