stanford-crfm / helmLinks
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.
☆2,636Updated this week
Alternatives and similar repositories for helm
Users that are interested in helm are comparing it to the libraries listed below
Sorting:
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,543Updated 2 years ago
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,936Updated 5 months ago
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models☆3,189Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆2,151Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,274Updated last week
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,808Updated 7 months ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆870Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,715Updated 2 months ago
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,770Updated last year
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,083Updated 2 years ago
- ☆1,560Updated this week
- Toolkit for creating, sharing and using natural language prompts.☆2,994Updated 2 years ago
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,590Updated 7 months ago
- Data and tools for generating and inspecting OLMo pre-training data.☆1,393Updated 2 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,842Updated this week
- AllenAI's post-training codebase☆3,538Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,055Updated last week
- Reference implementation for DPO (Direct Preference Optimization)☆2,832Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,653Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,664Updated last year
- Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"☆1,226Updated last year
- A modular RL library to fine-tune language models to human preferences☆2,375Updated last year
- A framework for few-shot evaluation of language models.☆11,246Updated this week
- ☆1,384Updated 2 years ago
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.☆792Updated last year
- General technology for enabling AI capabilities w/ LLMs and MLLMs☆4,262Updated last month
- Robust recipes to align language models with human and AI preferences☆5,481Updated 4 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,896Updated 2 years ago
- Minimalistic large language model 3D-parallelism training☆2,422Updated last month
- A library for advanced large language model reasoning☆2,324Updated 7 months ago