Tasks and Checks

How a task is defined, how it starts, and how Gym Anything decides whether it passed.

A task is one specific thing the agent is asked to do inside an environment.

In Moodle, a task might ask the agent to enroll a student in a course.

In some other environment, a task might ask it to configure a setting, export a file, fill in a form, or fix something that's broken.

So a task isn't the whole environment. It's one job inside that environment.

What Makes A Task Different From A Prompt

A task here is more than just an instruction string. It includes setup code, export logic, and an automatic checker — that's what makes it reusable and testable.

It usually comes with:

a written goal
code that prepares the right starting state
code that exports the result at the end
code that checks whether the task was solved

One Real Task Folder

Here is a real task folder from the Moodle benchmark:

benchmarks/cua_world/environments/moodle_env/tasks/enroll_student/
  task.json
  setup_task.sh
  export_result.sh
  verifier.py
  README.md

The files mean:

task.json: the task description and settings
setup_task.sh: prepares the exact starting state
export_result.sh: saves task-specific results at the end
verifier.py: checks whether the task passed
README.md: optional human explanation

What `task.json` Usually Contains

task.json is the main file to read first.

In a typical task, it tells you:

the instruction shown to the agent
the timeout and step limit
which setup script runs before the task
which check runs at the end

A simplified example looks like this:

{
  "description": "Enroll the student 'Jane Doe' in the 'Intro to Biology' course.",
  "init": {
    "timeout_sec": 300,
    "max_steps": 40
  },
  "hooks": {
    "pre_task": "/workspace/tasks/enroll_student/setup_task.sh",
    "post_task": "/workspace/tasks/enroll_student/export_result.sh"
  },
  "success": {
    "mode": "program",
    "spec": {
      "program": "verifier.py::verify_enroll_student"
    }
  }
}

What Happens When The Task Runs

The normal flow is:

the environment starts
the task setup script runs
the agent receives the task instruction
the agent interacts with the software
the run is finished with mark_done=True
the task export and final check run

If you're driving the environment directly from Python, the finish step usually looks like this:

obs, reward, done, info = env.step([], mark_done=True)

That's the step that tells Gym Anything to end the task cleanly and run the final task logic.

What The Setup Script Is For

The setup script makes the starting point specific to that task.

For the Moodle enroll_student task, the setup script:

resets the Moodle database state
creates the student and course records
restarts the Moodle service
leaves the app ready for the agent to begin

Without that step, the task wouldn't start from a known state.

What The Final Check Is For

The final check decides whether the task succeeded.

Most of our benchmark tasks use a Python check in verifier.py. That check can look at things like:

files created during the run
exported JSON or CSV output
application state
database contents
screenshots from the run

For the Moodle enroll-student task, the checker queries the Moodle database and verifies that the correct student was enrolled in the expected course.

We also support image-based checks and mixed checks, but most tasks use code in verifier.py.

If You Want To Understand A Task Quickly

Read these files in order:

task.json
setup_task.sh
verifier.py

That gives you the shortest path to understanding:

what the agent is supposed to do
what the task changes before the agent starts
what the checker will accept as success

If You Want To Create A New Task

The simplest workflow is:

copy an existing task folder that's close to what you want
change the instruction in task.json
update the setup script so the starting state matches the new goal
update the final check so it matches the new goal
run the task once yourself and make sure the check behaves the way you expect

Tasks and Checks

On this page