stanford-crfm / helm
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.
☆2,167Updated this week
Alternatives and similar repositories for helm:
Users that are interested in helm are comparing it to the libraries listed below
- Reference implementation for DPO (Direct Preference Optimization)☆2,523Updated 8 months ago
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,456Updated last month
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,718Updated 3 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,508Updated last year
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,387Updated last year
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,714Updated 8 months ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,719Updated last year
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆1,973Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,824Updated 8 months ago
- General technology for enabling AI capabilities w/ LLMs and MLLMs☆3,930Updated this week
- Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"☆1,151Updated last year
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,514Updated 2 weeks ago
- Aligning pretrained language models with instruction data generated by themselves.☆4,344Updated 2 years ago
- A modular RL library to fine-tune language models to human preferences☆2,298Updated last year
- ☆1,511Updated this week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,621Updated last year
- Toolkit for creating, sharing and using natural language prompts.☆2,820Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,468Updated last year
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆830Updated last week
- A framework for the evaluation of autoregressive code generation language models.☆930Updated 5 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,414Updated last week
- A framework for few-shot evaluation of language models.☆8,645Updated this week
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.☆754Updated 11 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,642Updated last week
- Aligning Large Language Models with Human: A Survey☆726Updated last year
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,817Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆818Updated 8 months ago
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,113Updated last year
- An Open-source Toolkit for LLM Development☆2,770Updated 3 months ago
- A library for advanced large language model reasoning☆2,084Updated last week