OpenGPTX / lm-evaluation-harnessLinks

A framework for few-shot evaluation of autoregressive language models.

☆12

Alternatives and similar repositories for lm-evaluation-harness

Users that are interested in lm-evaluation-harness are comparing it to the libraries listed below

Sorting:

arcee-ai / DAM
☆55Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
epoch-research / training-cost-trends
☆20Updated last month
CosineAI / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆15Updated last year
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
kyegomez / OpenStrawberry
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆29Updated this week
TheDuckAI / arb
Advanced Reasoning Benchmark Dataset for LLMs
☆47Updated 2 years ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆72Updated last year
austrian-code-wizard / c3po
☆29Updated 3 months ago
penfever / wildchat-50m
Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.
☆31Updated 7 months ago
ctlllll / understanding_llm_benchmarks
Understanding the correlation between different LLM benchmarks
☆29Updated last year
CodeClash-ai / CodeClash
🆕 Benchmarking Goal-Oriented Software Engineering
☆41Updated this week
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated 2 years ago
graphcore / Gradient-HuggingFace
Tasks and tutorials using Graphore's IPU with Hugging Face. Originally at https://github.com/gradient-ai/Graphcore-HuggingFace
☆16Updated last year
HazyResearch / embroid
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
☆11Updated 2 years ago
official-elinas / zeus-llm-trainer
Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models
☆69Updated 2 years ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆110Updated 11 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
anyscale / long-context-fine-tuning-blogpost
☆17Updated last year
kyegomez / MobileVLM
Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …
☆15Updated last year
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆28Updated 10 months ago
Zyphra / Zyda_processing
☆39Updated last year
vmarinowski / infini-attention
An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'
☆54Updated last year
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆136Updated 5 months ago
HishamAlyahya / semantic_backprop
Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖
☆76Updated 11 months ago
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆19Updated last week
SebastianBodza / EnsembleForecasting
Using multiple LLMs for ensemble Forecasting
☆16Updated last year
huggingface / peft-pytorch-conference
Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…
☆14Updated 2 years ago