BigComputer-Project / SWE-ArenaLinks
SWE Arena
☆34Updated last week
Alternatives and similar repositories for SWE-Arena
Users that are interested in SWE-Arena are comparing it to the libraries listed below
Sorting:
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆66Updated last month
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆100Updated last month
- Scaling Data for SWE-agents☆293Updated this week
- ☆117Updated 4 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆127Updated last week
- ☆104Updated 2 months ago
- A simple unified framework for evaluating LLMs☆221Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 4 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆207Updated this week
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆112Updated last week
- RepoQA: Evaluating Long-Context Code Understanding☆109Updated 8 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆221Updated last year
- Evaluation of LLMs on latest math competitions☆140Updated 2 months ago
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆127Updated 10 months ago