huggingface / evaluateLinks
π€ Evaluate: A library for easily evaluating machine learning models and datasets.
β2,335Updated last week
Alternatives and similar repositories for evaluate
Users that are interested in evaluate are comparing it to the libraries listed below
Sorting:
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,104Updated last week
- Efficient few-shot learning with Sentence Transformersβ2,574Updated 2 months ago
- A modular RL library to fine-tune language models to human preferencesβ2,355Updated last year
- β1,547Updated last month
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language modelsβ3,125Updated last year
- β1,247Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,625Updated 3 months ago
- The implementation of DeBERTaβ2,154Updated 2 years ago
- A Unified Library for Parameter-Efficient and Modular Transfer Learningβ2,772Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,660Updated last week
- β2,889Updated this week
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models β¦β2,492Updated last week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,711Updated last year
- General technology for enabling AI capabilities w/ LLMs and MLLMsβ4,144Updated 3 months ago
- PyTorch extensions for high performance and large scale training.β3,376Updated 5 months ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,180Updated last week
- Toolkit for creating, sharing and using natural language prompts.β2,943Updated last year
- Model explainability that works seamlessly with π€ transformers. Explain your transformers model in just 2 lines of code.β1,381Updated 2 years ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,005Updated last year
- MTEB: Massive Text Embedding Benchmarkβ2,876Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β7,627Updated this week
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,970Updated 4 months ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.β1,946Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,973Updated last week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,420Updated last year
- Original Implementation of Prompt Tuning from Lester, et al, 2021β694Updated 7 months ago
- Cramming the training of a (BERT-type) language model into limited compute.β1,349Updated last year
- Foundation Architecture for (M)LLMsβ3,117Updated last year
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"β1,784Updated 3 months ago
- Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.β563Updated last year