allenai / OLMo-EvalLinks
Evaluation suite for LLMs
☆350Updated 2 months ago
Alternatives and similar repositories for OLMo-Eval
Users that are interested in OLMo-Eval are comparing it to the libraries listed below
Sorting:
- ☆520Updated 7 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆732Updated 9 months ago
- Generative Representational Instruction Tuning☆654Updated 3 months ago
- Data and tools for generating and inspecting OLMo pre-training data.☆1,250Updated this week
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆653Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆463Updated last year
- distributed trainer for LLMs☆577Updated last year
- Official repository for ORPO☆455Updated last year
- Reproducible, flexible LLM evaluations☆214Updated last month
- Code for Quiet-STaR☆734Updated 10 months ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆545Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆221Updated 7 months ago
- [ACL 2024] Progressive LLaMA with Block Expansion.☆505Updated last year
- Train Models Contrastively in Pytorch☆723Updated 3 months ago
- RewardBench: the first evaluation tool for reward models.☆604Updated 2 weeks ago
- The official evaluation suite and dynamic data release for MixEval.☆243Updated 7 months ago
- Scalable toolkit for efficient model alignment☆818Updated 3 weeks ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆859Updated 2 weeks ago
- Scaling Data-Constrained Language Models☆335Updated 9 months ago
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,167Updated last year
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆617Updated last year
- ☆310Updated last year
- DSIR large-scale data selection framework for language model training☆251Updated last year
- ☆318Updated 9 months ago
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆367Updated last year
- A large-scale, fine-grained, diverse preference dataset (and models).☆342Updated last year
- Manage scalable open LLM inference endpoints in Slurm clusters☆261Updated 11 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆699Updated last year
- A bagel, with everything.☆321Updated last year
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,386Updated last year