stanford-crfm / helmLinks
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.
☆2,549Updated this week
Alternatives and similar repositories for helm
Users that are interested in helm are comparing it to the libraries listed below
Sorting:
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,906Updated 3 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,518Updated 2 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,799Updated 5 months ago
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,676Updated last week
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,756Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,126Updated this week
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models☆3,147Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆2,074Updated last year
- General technology for enabling AI capabilities w/ LLMs and MLLMs☆4,193Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,742Updated this week
- Toolkit for creating, sharing and using natural language prompts.☆2,972Updated 2 years ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,637Updated last year
- Data and tools for generating and inspecting OLMo pre-training data.☆1,345Updated 2 weeks ago
- ☆1,552Updated 3 weeks ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,932Updated this week
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆840Updated 10 months ago
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,581Updated 5 months ago
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,073Updated 2 years ago
- Aligning pretrained language models with instruction data generated by themselves.☆4,522Updated 2 years ago
- Robust recipes to align language models with human and AI preferences☆5,427Updated 2 months ago
- AllenAI's post-training codebase☆3,317Updated this week
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,635Updated last year
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,728Updated last year
- A framework for few-shot evaluation of language models.☆10,706Updated this week
- 🤗 Evaluate: A library for easily evaluating machine learning models and datasets.☆2,365Updated last week
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,868Updated last year
- A modular RL library to fine-tune language models to human preferences☆2,367Updated last year
- ☆1,313Updated 8 months ago
- ☆1,324Updated last year
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,134Updated last year