TerrariaBench

Runs

Visual replay pages for substantial open-ended benchmark runs. Short smoke tests and file-level metadata live in the raw section.

Raw runs table Spec

Scored Runs

Valid agent-driven progress from the current official sweep.

Same Sweep Failure Lanes

Other official models are kept visible for diagnosis, but separated from scored progress.