Qurrent-AI / RES-QLinks
RES-Q: Evaluating the Code-Editing Capability of Large Language Model Systems at the Repository Scale
☆27Updated last year
Alternatives and similar repositories for RES-Q
Users that are interested in RES-Q are comparing it to the libraries listed below
Sorting:
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆133Updated last week
- Sphynx Hallucination Induction☆53Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated last year
- ☆152Updated 5 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆112Updated last year
- ☆223Updated this week
- ☆118Updated 2 weeks ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 11 months ago
- Inference-time scaling for LLMs-as-a-judge.☆328Updated 3 months ago
- Red-Teaming Language Models with DSPy☆250Updated 11 months ago
- ☆133Updated 3 months ago
- Train your own SOTA deductive reasoning model☆107Updated 11 months ago
- ☆137Updated 10 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆153Updated last year
- ☆33Updated 8 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- ☆56Updated last year
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆66Updated 2 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆112Updated 9 months ago
- The first dense retrieval model that can be prompted like an LM☆90Updated 9 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆90Updated 2 years ago
- Harness used to benchmark aider against SWE Bench benchmarks☆79Updated last year
- Synthetic Data for LLM Fine-Tuning☆120Updated 2 years ago
- CodeSage: Code Representation Learning At Scale (ICLR 2024)☆116Updated last year
- ☆26Updated last year
- ☆120Updated last year
- ☆59Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆235Updated 6 months ago