Spec Reference

This page is a practical reference for the main fields in environment and task config files.

We don't list every dataclass field in the codebase — just the set most people need when reading or writing specs.

Environment Specs: `env.json`

An environment spec defines the shared world that tasks run inside.

Common top-level fields are:

id: environment identifier
version
description
base: preset base environment
image or dockerfile
resources
mounts
observation
action
recording
security
hooks
user_accounts
runner
os_type

Supported observation types

We currently support these observation types:

rgb_screen
ui_tree
audio_waveform
cli_stdout

Supported action types

We currently support these action types:

mouse
keyboard
voice
api_call

Common `env.json` groups

resources

CPU, memory, GPU, networking

mounts

host folders made available inside the environment

observation

what the agent receives, such as rgb_screen

action

what the agent can send, such as mouse and keyboard

recording

whether episode recording is enabled and where outputs go

security

runtime privilege and container/VM settings

hooks

lifecycle hooks such as pre_start, post_start, and reset

user_accounts

declared accounts, credentials, and permissions

Minimal example

{
  "id": "demo.moodle@0.1",
  "base": "ubuntu-gnome-systemd_highres",
  "observation": [
    {"type": "rgb_screen", "fps": 10, "resolution": [1920, 1080]}
  ],
  "action": [
    {"type": "mouse"},
    {"type": "keyboard"}
  ],
  "mounts": [
    {
      "source": "benchmarks/cua_world/environments/moodle_env/tasks",
      "target": "/workspace/tasks",
      "mode": "ro"
    }
  ],
  "hooks": {
    "pre_start": "/workspace/scripts/install_moodle.sh",
    "post_start": "/workspace/scripts/setup_moodle.sh"
  }
}

Task Specs: `task.json`

A task spec defines one job inside an environment.

Common top-level fields are:

id
env_id
version
description
difficulty
init
hooks
success
metadata

Common `task.json` groups

init

timeout, step limit, reward type, optional init helpers

hooks

pre_task, post_task, and pre_task_timeout

success

how success is checked

metadata

task-specific values used by the checker or supporting code

Task hook fields

Current task hook fields are:

pre_task
post_task
pre_task_timeout

Success modes

We currently support these success modes:

program
image_match
multi

Most of our benchmark tasks use program, with a checker such as:

{
  "success": {
    "mode": "program",
    "spec": {
      "program": "verifier.py::verify_enroll_student"
    }
  }
}

Minimal example

{
  "id": "enroll_student@1",
  "env_id": "moodle_env@0.1",
  "description": "Enroll the student 'Jane Doe' in the 'Intro to Biology' course.",
  "init": {
    "timeout_sec": 300,
    "max_steps": 40,
    "reward_type": "sparse"
  },
  "hooks": {
    "pre_task": "/workspace/tasks/enroll_student/setup_task.sh",
    "post_task": "/workspace/tasks/enroll_student/export_result.sh"
  },
  "success": {
    "mode": "program",
    "spec": {
      "program": "verifier.py::verify_enroll_student"
    }
  }
}

A Practical Note About Extra Fields

The loader recognizes the common environment and task fields directly. Some benchmark files also include extra top-level keys — those may be preserved as metadata or ignored unless code reads them explicitly.

Spec Reference

On this page