stanford-crfm / helmLinks
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.
☆2,301Updated this week
Alternatives and similar repositories for helm
Users that are interested in helm are comparing it to the libraries listed below
Sorting:
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,773Updated 5 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,434Updated 2 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,759Updated last week
- Expanding natural instructions☆1,006Updated last year
- ☆1,527Updated last week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,897Updated 10 months ago
- Data and tools for generating and inspecting OLMo pre-training data.☆1,250Updated this week
- General technology for enabling AI capabilities w/ LLMs and MLLMs☆4,032Updated last week
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,736Updated 10 months ago
- Robust recipes to align language models with human and AI preferences☆5,235Updated last month
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,539Updated 3 weeks ago
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models☆3,069Updated 11 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,641Updated this week
- AllenAI's post-training codebase☆3,028Updated this week
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆859Updated 2 weeks ago
- Toolkit for creating, sharing and using natural language prompts.☆2,887Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,426Updated this week
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,015Updated 2 years ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,551Updated last year
- A library for advanced large language model reasoning☆2,148Updated 2 weeks ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,773Updated this week
- Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"☆1,180Updated last year
- Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).☆769Updated last year
- 🤗 Evaluate: A library for easily evaluating machine learning models and datasets.☆2,241Updated this week
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,547Updated 2 weeks ago
- Aligning Large Language Models with Human: A Survey☆730Updated last year
- Reference implementation for DPO (Direct Preference Optimization)☆2,619Updated 10 months ago
- A framework for few-shot evaluation of language models.☆9,379Updated this week
- A modular RL library to fine-tune language models to human preferences☆2,317Updated last year
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆2,113Updated last year