Qurrent-AI / RES-Q
RES-Q: Evaluating the Code-Editing Capability of Large Language Model Systems at the Repository Scale
☆27Updated 6 months ago
Alternatives and similar repositories for RES-Q:
Users that are interested in RES-Q are comparing it to the libraries listed below
- Code for TrackTheMind☆68Updated last month
- Just a bunch of benchmark logs for different LLMs☆116Updated 5 months ago
- Sphynx Hallucination Induction☆51Updated 5 months ago
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- ☆115Updated this week
- ☆51Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆154Updated this week
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆54Updated 2 months ago
- ☆46Updated 2 months ago
- ☆108Updated 3 months ago
- ☆79Updated last week
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆41Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆154Updated 2 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 2 months ago
- Evaluating LLMs with CommonGen-Lite☆87Updated 9 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆74Updated this week
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆133Updated last week
- look how they massacred my boy☆63Updated 3 months ago
- ☆79Updated last week
- Code for the paper "Fishing for Magikarp"☆140Updated this week
- smolLM with Entropix sampler on pytorch☆147Updated 2 months ago
- ☆48Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆119Updated 5 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆209Updated last week
- Mixing Language Models with Self-Verification and Meta-Verification☆100Updated last month
- ☆97Updated 3 weeks ago
- ☆19Updated 2 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆215Updated 9 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆55Updated last month