Environments for frontier models.

Reinforcement learning environments for post-training. A model attempts real work through a tool-using agent loop, every step is graded against ground truth, and the reward returns to training.

01Environments
A neutral record, measured the same for every lab.

An environment is a task with a checkable outcome, taken from real work. A model is graded step by step against ground truth as it goes. Run the same task and you get the same number, whichever lab is running it.

02Method
One process, from a capability to a graded environment. The same five steps every time.
01
Perceive
Map a capability and its failure modes until the reward is well defined.
Capability
02
Represent
Formalize it into a task distribution with a verifiable rubric.
Rubric
03
Build
Stand up environments that separate cleanly from eval and resist contamination.
No contamination
04
Scale
Mass-produce variants across the distribution. Early environments become training data.
Distribution
05
Choose
Score pass@k by model. Point the next environment at what they fail.
pass@k
03Domains
Where the method is pointed. In priority order, by stakes.
01
Safety
Alignment and oversight. The first call on everything.
Priority
02
Defense
High-stakes capability and red-team work.
High-stakes
03
Science
Bio, pharma, research automation.
Research
04
Commerce
Agentic work on real company operations. Live today.
Live
04Why Idler
Grounded, broad, frontier.
A
Grounded
Environments from real production data, not invented. Less reward hacking, better transfer.
Real data
B
Broad
Coverage across coding, tool use, long-horizon, error recovery.
Coverage
C
Frontier
Built for the models clearing the hardest evals, on the work they fail next.
Frontier
05Contact
Name the capability your models miss. We build the environment, graded against ground truth.