commit-0 / commit0
Commit0: Library Generation from Scratch
☆142Updated 2 weeks ago
Alternatives and similar repositories for commit0:
Users that are interested in commit0 are comparing it to the libraries listed below
- RepoQA: Evaluating Long-Context Code Understanding☆107Updated 5 months ago
- r2e: turn any github repository into a programming agent environment☆108Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆167Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆170Updated 2 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 4 months ago
- Train your own SOTA deductive reasoning model☆83Updated last month
- LILO: Library Induction with Language Observations☆85Updated 7 months ago
- Evaluating LLMs with fewer examples☆148Updated 11 months ago
- Score LLM pretraining data with classifiers☆55Updated last year
- PyTorch building blocks for the OLMo ecosystem☆188Updated this week
- ☆114Updated last month
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆102Updated this week
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆135Updated 6 months ago
- ☆75Updated this week
- Evaluation of LLMs on latest math competitions☆93Updated last week
- Just a bunch of benchmark logs for different LLMs☆119Updated 8 months ago
- Website for hosting the Open Foundation Models Cheat Sheet.☆266Updated last month
- Long context evaluation for large language models☆205Updated last month
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆427Updated last week
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆314Updated this week
- Can Language Models Solve Olympiad Programming?☆112Updated 2 months ago
- A simple unified framework for evaluating LLMs☆210Updated last week
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆41Updated 8 months ago
- ☆60Updated 11 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆301Updated last month
- Functional Benchmarks and the Reasoning Gap☆84Updated 6 months ago
- The official evaluation suite and dynamic data release for MixEval.☆234Updated 5 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆60Updated 3 weeks ago
- ☆81Updated last month
- SWE Arena☆29Updated 2 weeks ago