bigcode-project / selfcodealignLinks

[NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation

☆309

Alternatives and similar repositories for selfcodealign

Users that are interested in selfcodealign are comparing it to the libraries listed below

Sorting:

lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆306Updated last year
evalplus / repoqa
RepoQA: Evaluating Long-Context Code Understanding
☆113Updated 9 months ago
SWE-bench / SWE-smith
Scaling Data for SWE-agents
☆328Updated this week
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆151Updated 9 months ago
abacaj / code-eval
Run evaluation on LLMs using human-eval benchmark
☆417Updated last year
bigcode-project / octopack
🐙 OctoPack: Instruction Tuning Code Large Language Models
☆472Updated 5 months ago
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆229Updated 3 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆128Updated 2 years ago
InternLM / SWE-Fixer
☆108Updated 2 months ago
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆168Updated 11 months ago
SWE-bench / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆197Updated 3 weeks ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
Re-Align / URIAL
☆311Updated last year
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆229Updated 9 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆174Updated 4 months ago
SWE-Gym / SWE-Gym
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆513Updated this week
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆242Updated 8 months ago
nlpxucan / evol-instruct
☆270Updated 2 years ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆233Updated 9 months ago
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆223Updated last year
NL2Code / CodeR
☆159Updated 11 months ago
LiveCodeBench / LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
☆608Updated 2 weeks ago
loubnabnl / santacoder-finetuning
Fine-tune SantaCoder for Code/Text Generation.
☆192Updated 2 years ago
allenai / olmes
Reproducible, flexible LLM evaluations
☆226Updated 3 weeks ago
TIGER-AI-Lab / MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
☆264Updated 5 months ago
my-other-github-account / llm-humaneval-benchmarks
☆84Updated 2 years ago
booydar / babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
☆208Updated 2 months ago
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆135Updated last year
facebookresearch / swe-rl
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆571Updated 4 months ago
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆118Updated last year