Benchmarks

How the benchmark suite is organized, and what you actually edit when you use it.

Our benchmark suite ships with ready-made environments and tasks.

You use it when:

you want to run something immediately
you want examples of how environments and tasks are written
you want to add your own task on top of an existing environment

The Main Idea

In this repository, a benchmark is usually:

one environment folder
one or more task folders inside that environment

The main benchmark suite lives in:

benchmarks/cua_world/

Most of the time, the part you care about is:

benchmarks/cua_world/environments/

One Real Example

Here is one real benchmark environment:

benchmarks/cua_world/environments/moodle_env/

Inside it, one real task is:

benchmarks/cua_world/environments/moodle_env/tasks/enroll_student/

So the relationship is:

moodle_env = the environment
enroll_student = one task inside that environment

That pattern repeats throughout our benchmark suite.

What Is In An Environment Folder

An environment folder usually contains:

env.json
tasks/
support folders such as scripts/, config/, data/, utils/, or assets/

The environment folder is the shared base for all tasks inside it.

What `env.json` Actually Does

env.json is the environment configuration file.

It doesn't just say how the app starts. It usually defines things like:

which base environment or image to use
what observations the agent receives
what actions the agent can send
resource settings such as CPU, memory, and networking
mounted folders such as scripts/, tasks/, config/, or utils/
hooks such as pre_start and post_start
user accounts
recording, VNC, ADB, or other connection settings
runner or OS-specific settings when needed

For example, the Moodle environment config mounts its scripts, tasks, config, utils, and assets folders, and it uses pre_start and post_start hooks to install and set up Moodle before tasks begin.

So if you want to understand how an environment is defined, env.json is the first file to read.

What The Task Folders Add

The task folders under tasks/ add the part that changes from one job to another.

That usually includes:

the instruction for the agent
any task-specific setup
the final check for success

So:

the environment folder defines the shared world
each task folder defines one job inside that world

The Three Common Things People Do Here

1. Run an existing benchmark task

You pick:

an environment folder
a task id inside that environment

Example:

from gym_anything import from_config

env = from_config(
    "benchmarks/cua_world/environments/moodle_env",
    task_id="enroll_student",
)

Or from the CLI:

gym-anything run moodle --task enroll_student -i

2. Read an existing benchmark to understand how it works

The usual reading order is:

env.json
one task folder inside tasks/
that task's task.json
that task's checker

That gives you the environment definition first, and then one concrete task built on top of it.

3. Add a new task to an existing environment

This usually means:

pick an existing environment folder
copy a nearby task folder inside tasks/
change the task description, setup, and final check

That's often much easier than creating a new environment from scratch.

What Split Files Are For

You'll also see names such as train, test, all, and verified.

These are named lists of tasks. They're mainly used when you want to run many tasks together.

If you're only trying one task by hand, you don't need split files yet.

Benchmarks

On this page