BigComputer-Project / SWE-ArenaLinks
SWE Arena
☆33Updated last month
Alternatives and similar repositories for SWE-Arena
Users that are interested in SWE-Arena are comparing it to the libraries listed below
Sorting:
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆62Updated 8 months ago
- Scaling Data for SWE-agents☆212Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆61Updated last week
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆73Updated last month
- RepoQA: Evaluating Long-Context Code Understanding☆108Updated 7 months ago
- ☆114Updated 3 months ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 6 months ago
- ☆34Updated 2 months ago
- Replicating O1 inference-time scaling laws☆87Updated 6 months ago
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆219Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated last week
- ☆29Updated 2 weeks ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆89Updated last week
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆88Updated last week
- A benchmark for LLMs on complicated tasks in the terminal☆134Updated this week
- ☆61Updated last year
- ☆58Updated 2 weeks ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆117Updated this week
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated 9 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆207Updated 3 weeks ago
- ☆49Updated 3 weeks ago
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆49Updated this week
- ☆82Updated last year
- ☆39Updated 11 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated 3 weeks ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆169Updated this week
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆140Updated 7 months ago