TerrariaBench / raw

Raw Runs

Engineering view for all published artifacts, including smoke runs, short checks, raw IDs, frame counts, clips, and result files.

Visual runs Harness trace notes Repository README
Published Runs
7
Active Task
Open-ended progress
Scoring
Agency score
Assets
Videos and JSON

All Published Runs

RunHarnessModelProblemAgencyRawSurvivalLevelLast MilestoneElapsedFramesClipsStatus
Gemini 3 Flash via api - Open-ended progress
20260507_164917_1_api_google_gemini-3-flash-preview_03_open_ended_progress
api Gemini 3 Flash Open-ended progress 4 7 0 L2 wood 100 28802s 4814 1 stalled
Claude Opus 4.7 via api - Open-ended progress
20260507_164917_1_api_anthropic_claude-opus-4.7_03_open_ended_progress
api Claude Opus 4.7 Open-ended progress 0 4 0 L0 inventory opened 28802s 3326 1 invalid
Gemini 3.1 Pro via api - Open-ended progress
20260507_164917_1_api_google_gemini-3.1-pro-preview_03_open_ended_progress
api Gemini 3.1 Pro Open-ended progress 3 5 0 L2 wood 25 28802s 3734 1 stalled
GPT-5.5 via api - Open-ended progress
20260507_164917_1_api_openai_gpt-5.5_03_open_ended_progress
api GPT-5.5 Open-ended progress 0 3 0 L0 held pickaxe 28802s 3080 1 invalid
Kimi K2.6 via api - Open-ended progress
20260507_164917_1_api_moonshotai_kimi-k2.6_03_open_ended_progress
api Kimi K2.6 Open-ended progress 0 4 0 L0 inventory opened 28802s 1622 1 invalid
Qwen 3.6 27B via api - Open-ended progress
20260507_164917_1_api_qwen_qwen3.6-27b_03_open_ended_progress
api Qwen 3.6 27B Open-ended progress 0 4 0 L0 inventory opened 21935s 6152 1 invalid
Gemini 3.1 Flash Lite via api - Open-ended progress
20260507_164917_1_api_google_gemini-3.1-flash-lite-preview_03_open_ended_progress
api Gemini 3.1 Flash Lite Open-ended progress 3 6 0 L2 wood 25 21930s 5838 1 complete