idler Material / Montessori

An eval is a learning material.

Reinforcement learning environments that train frontier models to expert level, graded against ground truth. The model is the student; the environment is what it grasps or drops.

Environments · what each one is
Verifiable tasks
Tasks that can be checked against ground truth.
Dense reward
Scored step by step, not just pass or fail.
Real rollouts
Grounded in real production work, not invented.
Method · from a problem space to evaluating environments
Domains · where we collaborate, in priority order
Safety
Alignment and oversight. Defense evals fit here. The first call on everything.
Defense
High-stakes capability and red-teaming, including weapons-capability red-teaming.
Science
Bio, pharma, clinical-trials automation, and fundamental research.
Commerce
Indexing workflows from real companies. What we are doing now.
Why Idler · the neutral record
Grounded
Environments from real production work, not invented.
Neutral
A record measured the same way for every lab.
Broad
Across the problem space and its sub-spaces.
About · mission and the neutral record
Mission
Train frontier models on environments built from real problem spaces, graded against ground truth.
The neutral record
A corpus measured the same way for every lab.
Team
A small team, working quietly with frontier labs.
Blog · research notes and method write-ups
Shelf Life
Representing a problem space in thirty pages.
Environments under RL
What our environments do to models when applied with RL.
Dense reward
Why step-by-step grading beats pass or fail.
Careers · open roles
Collaborators
Run this process with new people. Priority: Safety, Defense, Science, Commerce.
Environment engineering
Build and scale environments across problem spaces.
Contact · request access and partnerships
Request access
See the environments and what they measure.
Partnerships
Run the process together on a problem space.
Reach us
Idler Inc. / San Franciscoidler.aihi@idler.ai