wellecks / lm-evaluation-harnessLinks
A framework for few-shot evaluation of autoregressive language models.
☆24Updated last year
Alternatives and similar repositories for lm-evaluation-harness
Users that are interested in lm-evaluation-harness are comparing it to the libraries listed below
Sorting:
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- ☆82Updated 4 months ago
- A unified benchmark for math reasoning☆88Updated 2 years ago
- ☆52Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- ☆34Updated last year
- ☆85Updated last year
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆30Updated 11 months ago
- Revisiting Mid-training in the Era of RL Scaling☆48Updated last month
- ☆33Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆75Updated last year
- ☆31Updated last year
- ☆97Updated last year
- Code for the paper "Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving"☆19Updated 2 years ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- ☆48Updated 3 weeks ago
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Updated 3 months ago
- ☆45Updated last year
- Discriminator-Guided Chain-of-Thought Reasoning☆47Updated 7 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51Updated last year
- ☆11Updated 11 months ago
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆32Updated last year
- The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".☆69Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆55Updated 3 months ago
- ☆50Updated last year
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Updated last year
- The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free☆38Updated 3 weeks ago
- Self-Alignment with Principle-Following Reward Models☆161Updated 3 weeks ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆21Updated 9 months ago