alexzhang13 / videogamebenchLinks
Benchmark environment for evaluating vision-language models (VLMs) on popular video games!
☆262Updated last week
Alternatives and similar repositories for videogamebench
Users that are interested in videogamebench are comparing it to the libraries listed below
Sorting:
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆505Updated 3 weeks ago
- ☆68Updated last month
- GRadient-INformed MoE☆263Updated 8 months ago
- ☆140Updated last month
- ☆153Updated last month
- ☆145Updated last month
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆173Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated 3 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆105Updated last month
- EvaByte: Efficient Byte-level Language Models at Scale☆101Updated last month
- Official implementation of the paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"☆136Updated last week
- prime-rl is a codebase for decentralized async RL training at scale☆318Updated this week
- ☆86Updated 2 weeks ago
- ☆111Updated 5 months ago
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆49Updated last week
- Train your own SOTA deductive reasoning model☆93Updated 3 months ago
- Scaling Data for SWE-agents☆220Updated this week
- Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement☆92Updated 3 months ago
- Code for the paper: "Learning to Reason without External Rewards"☆237Updated this week
- Dream 7B, a large diffusion language model☆737Updated this week
- Benchmarking Agentic LLM and VLM Reasoning On Games☆146Updated last month
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆342Updated last month
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆447Updated this week
- accompanying material for sleep-time compute paper☆90Updated last month
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆535Updated 2 months ago
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning☆375Updated this week
- Official repo for Learning to Reason for Long-Form Story Generation☆60Updated last month
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…☆339Updated this week
- PyTorch implementation of models from the Zamba2 series.☆182Updated 4 months ago