Agents
The reference agents and programs that run them against Gym Anything environments.
The agents/ folder contains reference agents and the programs used to run them.
You don't need this part of the repository to use Gym Anything itself.
You only need it if:
- you want to run one of the reference agents
- you want a starting point for your own agent
- you want to evaluate an agent across benchmark tasks
The Two Main Parts
Most of this section comes down to two folders:
Agents
Agent classes that decide what actions to take
Evaluation
Programs that run agents against environments
What Lives In agents/agents/
agents/agents/ contains the agent classes.
An agent is the code that looks at the current observation and decides what actions to take next.
We currently ship these agents:
ClaudeAgentGemini3AgentQwen3VLAgentKimiAzureAgent
These classes are exported through agents.agents, which is why the evaluation commands refer to names such as --agent ClaudeAgent.
What Lives In agents/evaluation/
agents/evaluation/ contains the programs that actually run an agent against environments and tasks.
The two main ones are:
run_single.py: run one agent on one taskrun_batch.py: run one agent across many tasks
If you only want to try an agent once, start with run_single.py.
What An Agent Receives
Our current agent interface is built around four methods:
__init__init(task_description, display_resolution, save_path)step(obs, action_outputs)finish(...)
The important one is step(...).
That method receives:
- the latest observation from the environment
- the outputs from the previous actions
and it returns one or more action groups.
Each action group contains:
tool_idactions
The actions list can contain normal environment actions such as mouse and keyboard input. It can also contain built-in control actions such as:
{"action": "screenshot"}{"action": "wait", "time": 1.5}
Those control actions are handled by the environment layer, not by the evaluation loop.
The Fastest Way To Try One
gym-anything benchmark moodle --task enroll_student --agent ClaudeAgent --model claude-opus-4That command loads the environment, resets it, creates the agent, lets it act until the run finishes, and writes run artifacts.
Run gym-anything agents to see all available agent names.
If You Want To Add Your Own Agent
The simplest workflow is:
- copy a nearby file in
agents/agents/ - implement the same basic methods
- export the new class from
agents/agents/__init__.py - run it with
gym-anything benchmark
If you're starting from scratch, read:
agents/agents/base.py- one concrete agent such as
agents/agents/claude.py agents/evaluation/run_single.py
That gives you the agent interface first, then one implementation, then the program that drives it.