TerrariaBench

Gemini 3 Flash via api - Open-ended progress

4 agency score, best visible milestone: wood 100.

Current Sweep Diagnostics

Published pre-fix runs are retained for inspection, but runs with direct tile-control evidence are marked invalid until rerun with real pointer input.

Chart data
Checkpoint count ranking for the current sweep

Furthest By Agency Score

Higher means the agent triggered more objective, action-backed game-state milestones during the equal-budget run.

Agent-driven checkpoints grouped by difficulty level

Progress By Level

Bars group achieved checkpoints into tree/resource gathering, crafting, biome and underground self-sufficiency, pre-hardmode bosses, and Wall of Flesh progression.

Cumulative checkpoints over elapsed minutes

Agency Accumulation

Step curves show when action-backed progress happened, distinguishing early bursts from late-horizon progress.

Milestone achievement time heatmap

Concrete Milestones

Cells show the first elapsed minute at which each model reached inventory, mining, wood, stone, torch, and cavern milestones.

Featured Runs

Preview clips show checkpoint and action windows from the game plus agent transcript. Click through for replay clips and logs.

All visual runs

Engineering Artifacts

Raw run IDs, asset counts, docs, and JSON files are still available, just separated from the main reading path.

Open raw section