π΅ Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"
β24Dec 14, 2025Updated 2 months ago
Alternatives and similar repositories for FlashAdventure
Users that are interested in FlashAdventure are comparing it to the libraries listed below
Sorting:
- [SIGGRAPH Asia 2025] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modelingβ44Sep 26, 2025Updated 5 months ago
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agentsβ33Feb 1, 2026Updated last month
- A Comprehensive Dataset for Advanced Image Generation and Editing}β31Oct 2, 2025Updated 4 months ago
- A unified robotic manipulation learning frameworkβ21Sep 4, 2025Updated 5 months ago
- Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Modelsβ30Oct 6, 2025Updated 4 months ago
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"β29Jun 3, 2025Updated 8 months ago
- β22Dec 30, 2024Updated last year
- Reproducible Language Agent Researchβ34Jun 25, 2025Updated 8 months ago
- [ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalitiesβ38Sep 26, 2025Updated 5 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-wβ¦β12Jun 28, 2025Updated 8 months ago
- A Text2SQL benchmark for evaluation of Large Language Modelsβ41Updated this week
- [ICCV 2025] Dynamic-VLMβ28Dec 16, 2024Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"β34Jun 13, 2025Updated 8 months ago
- Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokensβ¦β46Jun 12, 2025Updated 8 months ago
- β18Jun 10, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenariosβ16Oct 18, 2024Updated last year
- Sotopia-RL: Reward Design for Social Intelligenceβ46Jan 29, 2026Updated last month
- β39Aug 6, 2025Updated 6 months ago
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedbackβ38Jun 24, 2025Updated 8 months ago
- [ACL 2025] Exploring Compositional Generalization of Multimodal LLMs for Medical Imagingβ39Jun 4, 2025Updated 8 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspectiveβ42Sep 18, 2025Updated 5 months ago
- [ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"β41Dec 13, 2024Updated last year
- JudgeLRM: Large Reasoning Models as a Judgeβ41Jan 29, 2026Updated last month
- Large-scale semi-supervised framework with 1B+ labeled masks from 48K+ datasets with test-time adaptation to new domains (ICCV25).β44Dec 28, 2025Updated 2 months ago
- The official implement of paper γDaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agentsγβ29Oct 23, 2025Updated 4 months ago
- β40Jan 14, 2025Updated last year
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"β10Jul 19, 2024Updated last year
- Symphony β A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge deviβ¦β30Oct 30, 2025Updated 4 months ago
- β11Jun 22, 2025Updated 8 months ago
- [ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMsβ30Updated this week
- β72Jan 29, 2026Updated last month
- A Framework for Evaluating AI Agent Safety in Realistic Environmentsβ30Oct 2, 2025Updated 5 months ago
- Official repository of LIBERO-plus, a generalized benchmark for in-depth robustness analysis of vision-language-action models.β220Jan 21, 2026Updated last month
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversityβ22Aug 28, 2025Updated 6 months ago
- β24Jan 8, 2026Updated last month
- Code repository supporting the paper "Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segmentβ¦β11Apr 29, 2024Updated last year
- β62Jul 1, 2025Updated 8 months ago
- Continuous Pipelined Speculative Decodingβ16Jan 4, 2026Updated last month
- Official Implementation of HIMA (COLM'25)β19Nov 25, 2025Updated 3 months ago