Spec Reference
The main fields used in `env.json` and `task.json`.
This page is a practical reference for the main fields in environment and task config files.
We don't list every dataclass field in the codebase — just the set most people need when reading or writing specs.
Environment Specs: env.json
An environment spec defines the shared world that tasks run inside.
Common top-level fields are:
id: environment identifierversiondescriptionbase: preset base environmentimageordockerfileresourcesmountsobservationactionrecordingsecurityhooksuser_accountsrunneros_type
Supported observation types
We currently support these observation types:
rgb_screenui_treeaudio_waveformcli_stdout
Supported action types
We currently support these action types:
mousekeyboardvoiceapi_call
Common env.json groups
resources
- CPU, memory, GPU, networking
mounts
- host folders made available inside the environment
observation
- what the agent receives, such as
rgb_screen
action
- what the agent can send, such as
mouseandkeyboard
recording
- whether episode recording is enabled and where outputs go
security
- runtime privilege and container/VM settings
hooks
- lifecycle hooks such as
pre_start,post_start, andreset
user_accounts
- declared accounts, credentials, and permissions
Minimal example
{
"id": "demo.moodle@0.1",
"base": "ubuntu-gnome-systemd_highres",
"observation": [
{"type": "rgb_screen", "fps": 10, "resolution": [1920, 1080]}
],
"action": [
{"type": "mouse"},
{"type": "keyboard"}
],
"mounts": [
{
"source": "benchmarks/cua_world/environments/moodle_env/tasks",
"target": "/workspace/tasks",
"mode": "ro"
}
],
"hooks": {
"pre_start": "/workspace/scripts/install_moodle.sh",
"post_start": "/workspace/scripts/setup_moodle.sh"
}
}Task Specs: task.json
A task spec defines one job inside an environment.
Common top-level fields are:
idenv_idversiondescriptiondifficultyinithookssuccessmetadata
Common task.json groups
init
- timeout, step limit, reward type, optional init helpers
hooks
pre_task,post_task, andpre_task_timeout
success
- how success is checked
metadata
- task-specific values used by the checker or supporting code
Task hook fields
Current task hook fields are:
pre_taskpost_taskpre_task_timeout
Success modes
We currently support these success modes:
programimage_matchmulti
Most of our benchmark tasks use program, with a checker such as:
{
"success": {
"mode": "program",
"spec": {
"program": "verifier.py::verify_enroll_student"
}
}
}Minimal example
{
"id": "enroll_student@1",
"env_id": "moodle_env@0.1",
"description": "Enroll the student 'Jane Doe' in the 'Intro to Biology' course.",
"init": {
"timeout_sec": 300,
"max_steps": 40,
"reward_type": "sparse"
},
"hooks": {
"pre_task": "/workspace/tasks/enroll_student/setup_task.sh",
"post_task": "/workspace/tasks/enroll_student/export_result.sh"
},
"success": {
"mode": "program",
"spec": {
"program": "verifier.py::verify_enroll_student"
}
}
}A Practical Note About Extra Fields
The loader recognizes the common environment and task fields directly. Some benchmark files also include extra top-level keys — those may be preserved as metadata or ignored unless code reads them explicitly.