ahnjaewoo / FlashAdventureLinks
π΅ Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"
β23Updated last month
Alternatives and similar repositories for FlashAdventure
Users that are interested in FlashAdventure are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025 Spotlight] Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"β53Updated 8 months ago
- β32Updated 2 weeks ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"β17Updated 3 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.β73Updated last year
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agentsβ37Updated 4 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasksβ94Updated 7 months ago
- JudgeLRM: Large Reasoning Models as a Judgeβ40Updated last week
- A Recipe for Building LLM Reasoners to Solve Complex Instructionsβ29Updated 4 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvementβ129Updated 6 months ago
- Resa: Transparent Reasoning Models via SAEsβ47Updated 4 months ago
- β68Updated 4 months ago
- β29Updated 3 months ago
- β51Updated 8 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learningβ60Updated 3 months ago
- Reagent: Exploring Reasoning Reward Model for Agentsβ31Updated this week
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"β19Updated 10 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agentsβ48Updated 11 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Frameworkβ71Updated 8 months ago
- β64Updated 3 months ago
- β33Updated 6 months ago
- An automated data pipeline scaling RL to pretraining levelsβ73Updated 3 months ago
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Timeβ89Updated 7 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasksβ36Updated 2 months ago
- More reliable Video Understanding Evaluationβ13Updated 4 months ago
- Official code repository for Sketch-of-Thought (SoT)β135Updated 9 months ago
- βοΈ [ICLR 2026] Official code of "Search Arena: Analyzing Search-Augmented LLMs".β49Updated last week
- This is the official project of paper: Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Converβ¦β22Updated last year
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"β136Updated 5 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understandingβ53Updated last year
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)β65Updated last week