01Environments
Every environment is a real engineering task, graded against a working result.
Real tasks
Pulled from live codebases, with a checkable result.
Dense reward
Every step scored, not just the final patch.
Real rollouts
Grounded in production engineering, not invented benchmarks.
02Method
From real engineering work to a graded world. The same five steps every time.
01Signal
Perceive
Find where coding agents break on real engineering work.
02Spec
Represent
Turn the task into an environment with a checkable result.
03Build
Build
Stand up the repo, the tests, and the grader.
04Scale
Scale
Mass-produce variants. Early environments become training data.
05Loop
Measure
Score where models fail, and aim the next environment there.
03Domains
The engineering work the environments are built from.
Debugging
Reproduce, localize, and fix real bugs in a live repo.
FixFeature work
Build features across an unfamiliar codebase.
BuildRefactors
Restructure code without breaking what works.
ShapeTests & review
Write tests, read diffs, and catch regressions.
Verify04Why Idler
Real, graded, frontier.
A
Real
Environments from real engineering work, never invented benchmarks. The skill transfers.
B
Graded
Every step checked against a working result. Dense reward, not just pass or fail.
C
Frontier
Built for the best models, aimed at the engineering they still get wrong.
05Contact
Tell us where your models fail at real engineering. We build the world that trains it.