allenai / OLMo-EvalLinks

Evaluation suite for LLMs

☆366

Alternatives and similar repositories for OLMo-Eval

Users that are interested in OLMo-Eval are comparing it to the libraries listed below

Sorting:

FranxYao / Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆477Updated last year
lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆314Updated last year
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆253Updated last year
huggingface / cosmopedia
☆552Updated last year
epfLLM / Megatron-LLM
distributed trainer for LLMs
☆583Updated last year
jzhang38 / EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆750Updated last year
xfactlab / orpo
Official repository for ORPO
☆465Updated last year
Re-Align / URIAL
☆313Updated last year
allenai / olmes
Reproducible, flexible LLM evaluations
☆266Updated this week
datamllab / LongLM
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
☆660Updated last year
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆246Updated last year
allenai / fm-cheatsheet
Website for hosting the Open Foundation Models Cheat Sheet.
☆268Updated 6 months ago
nomic-ai / contrastors
Train Models Contrastively in Pytorch
☆754Updated 7 months ago
declare-lab / instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
☆551Updated last year
allenai / dolma
Data and tools for generating and inspecting OLMo pre-training data.
☆1,345Updated 2 weeks ago
GAIR-NLP / MathPile
[NeurlPS D&B 2024] Generative AI for Math: MathPile
☆418Updated 7 months ago
OpenBMB / Eurus
☆320Updated last year
OpenLemur / Lemur
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
☆554Updated 2 years ago
ezelikman / quiet-star
Code for Quiet-STaR
☆741Updated last year
mlfoundations / open_lm
A repository for research on medium sized language models.
☆518Updated 5 months ago
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆204Updated last year
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆277Updated last year
pratyushasharma / laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆390Updated last year
huggingface / llm_training_handbook
An open collection of methodologies to help with successful training of large language models.
☆539Updated last year
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆319Updated this week
TIGER-AI-Lab / MAmmoTH
Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]
☆376Updated last year
OpenLMLab / LEval
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
☆391Updated last year
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆254Updated 7 months ago
ContextualAI / gritlm
Generative Representational Instruction Tuning
☆678Updated 4 months ago
OpenBMB / InfiniteBench
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
☆355Updated last year