Reinforcement learning environments for post-training. Signal moves through the corpus: model to task to grade, and the reward returns to training.
An environment is a real engineering task with a checkable outcome. The model works it through a tool-using agent loop; every step is scored against ground truth. Not invented problems. Not a reward you cannot verify.