huggingface / evaluateLinks
π€ Evaluate: A library for easily evaluating machine learning models and datasets.
β2,241Updated this week
Alternatives and similar repositories for evaluate
Users that are interested in evaluate are comparing it to the libraries listed below
Sorting:
- A Unified Library for Parameter-Efficient and Modular Transfer Learningβ2,721Updated 3 weeks ago
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language modelsβ3,069Updated 11 months ago
- β1,527Updated last week
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β2,950Updated this week
- β2,834Updated 3 weeks ago
- Accessible large language models via k-bit quantization for PyTorch.β7,150Updated this week
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models β¦β2,301Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,860Updated this week
- General technology for enabling AI capabilities w/ LLMs and MLLMsβ4,032Updated last week
- A modular RL library to fine-tune language models to human preferencesβ2,317Updated last year
- Toolkit for creating, sharing and using natural language prompts.β2,887Updated last year
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,674Updated last year
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.β1,773Updated 5 months ago
- MTEB: Massive Text Embedding Benchmarkβ2,626Updated this week
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)β1,120Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,547Updated 2 weeks ago
- Measuring Massive Multitask Language Understanding | ICLR 2021β1,434Updated 2 years ago
- A framework for few-shot evaluation of language models.β9,379Updated this week
- β1,224Updated 10 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,426Updated this week
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,847Updated 3 weeks ago
- Expanding natural instructionsβ1,006Updated last year
- Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.β561Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,641Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,397Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,020Updated 3 months ago
- Aligning pretrained language models with instruction data generated by themselves.β4,396Updated 2 years ago
- Efficient few-shot learning with Sentence Transformersβ2,509Updated 2 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,101Updated 2 weeks ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,131Updated last year