allenai / OLMo-Eval
Evaluation suite for LLMs
☆339Updated 3 months ago
Alternatives and similar repositories for OLMo-Eval:
Users that are interested in OLMo-Eval are comparing it to the libraries listed below
- ☆502Updated 4 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆299Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆454Updated last year
- Train Models Contrastively in Pytorch☆666Updated last month
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆646Updated 9 months ago
- PyTorch building blocks for the OLMo ecosystem☆172Updated this week
- Official repository for ORPO☆445Updated 9 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆706Updated 6 months ago
- Code for Quiet-STaR☆721Updated 7 months ago
- ☆312Updated 6 months ago
- Data and tools for generating and inspecting OLMo pre-training data.☆1,170Updated 2 weeks ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆313Updated 6 months ago
- The official evaluation suite and dynamic data release for MixEval.☆233Updated 4 months ago
- RewardBench: the first evaluation tool for reward models.☆532Updated last month
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆819Updated this week
- Generative Representational Instruction Tuning☆612Updated last week
- A repository for research on medium sized language models.☆493Updated 2 months ago
- Reproducible, flexible LLM evaluations☆180Updated this week
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,135Updated 10 months ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆544Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆219Updated 4 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆660Updated last week
- A bagel, with everything.☆317Updated 11 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆598Updated last year
- awesome synthetic (text) datasets☆265Updated 4 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆253Updated 8 months ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆471Updated 9 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆205Updated 10 months ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆382Updated 8 months ago
- ☆307Updated 9 months ago