allenai / OLMo-EvalLinks
Evaluation suite for LLMs
☆365Updated 3 months ago
Alternatives and similar repositories for OLMo-Eval
Users that are interested in OLMo-Eval are comparing it to the libraries listed below
Sorting:
- ☆546Updated 11 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆311Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆477Updated last year
- Official repository for ORPO☆464Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆250Updated 11 months ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆548Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆750Updated last year
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆661Updated last year
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆389Updated last year
- Train Models Contrastively in Pytorch☆753Updated 7 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆244Updated 11 months ago
- Reproducible, flexible LLM evaluations☆260Updated this week
- ☆274Updated 2 years ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆723Updated last year
- Automatic evals for LLMs☆550Updated 4 months ago
- PyTorch building blocks for the OLMo ecosystem☆311Updated this week
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated last year
- Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467☆297Updated 8 months ago
- Generative Representational Instruction Tuning☆675Updated 4 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆273Updated last year
- A repository for research on medium sized language models.☆515Updated 4 months ago
- ☆313Updated last year
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆631Updated last year
- Code for Quiet-STaR☆739Updated last year
- Code and data for "Lost in the Middle: How Language Models Use Long Contexts"☆361Updated last year
- Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]☆377Updated last year
- 🐙 OctoPack: Instruction Tuning Code Large Language Models☆472Updated 8 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆470Updated last year
- [ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark☆390Updated last year
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆554Updated 2 years ago