stanford-crfm / helmLinks
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.
☆2,480Updated this week
Alternatives and similar repositories for helm
Users that are interested in helm are comparing it to the libraries listed below
Sorting:
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,858Updated last month
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,497Updated 2 years ago
- ☆1,540Updated last month
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆2,028Updated last year
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,745Updated last year
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,782Updated 3 months ago
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models☆3,119Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,621Updated 3 months ago
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,048Updated 2 years ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,942Updated this week
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,564Updated 3 months ago
- General technology for enabling AI capabilities w/ LLMs and MLLMs☆4,135Updated 2 months ago
- Data and tools for generating and inspecting OLMo pre-training data.☆1,314Updated this week
- A modular RL library to fine-tune language models to human preferences☆2,350Updated last year
- Toolkit for creating, sharing and using natural language prompts.☆2,937Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,636Updated this week
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆809Updated 8 months ago
- AllenAI's post-training codebase☆3,199Updated this week
- Robust recipes to align language models with human and AI preferences☆5,373Updated 2 weeks ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,610Updated last year
- 🤗 Evaluate: A library for easily evaluating machine learning models and datasets.☆2,323Updated this week
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆886Updated 2 months ago
- Reference implementation for DPO (Direct Preference Optimization)☆2,735Updated last year
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,593Updated last year
- Expanding natural instructions☆1,018Updated last year
- [ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…☆963Updated 11 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,853Updated last year
- ☆1,307Updated last year
- Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"☆1,203Updated last year
- Aligning pretrained language models with instruction data generated by themselves.☆4,475Updated 2 years ago