huggingface / evaluateLinks
π€ Evaluate: A library for easily evaluating machine learning models and datasets.
β2,320Updated last month
Alternatives and similar repositories for evaluate
Users that are interested in evaluate are comparing it to the libraries listed below
Sorting:
- A Unified Library for Parameter-Efficient and Modular Transfer Learningβ2,766Updated last month
- β1,538Updated 3 weeks ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,075Updated last week
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language modelsβ3,113Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,614Updated 3 months ago
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models β¦β2,470Updated this week
- Efficient few-shot learning with Sentence Transformersβ2,562Updated last month
- A modular RL library to fine-tune language models to human preferencesβ2,348Updated last year
- Toolkit for creating, sharing and using natural language prompts.β2,931Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,616Updated 3 weeks ago
- β1,240Updated last year
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,006Updated last year
- BERT score for text generationβ1,803Updated last year
- The implementation of DeBERTaβ2,146Updated last year
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,710Updated last year
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,133Updated this week
- β2,882Updated 2 weeks ago
- PyTorch extensions for high performance and large scale training.β3,369Updated 4 months ago
- Model explainability that works seamlessly with π€ transformers. Explain your transformers model in just 2 lines of code.β1,375Updated 2 years ago
- Cramming the training of a (BERT-type) language model into limited compute.β1,349Updated last year
- maximal update parametrization (Β΅P)β1,599Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,890Updated this week
- General technology for enabling AI capabilities w/ LLMs and MLLMsβ4,128Updated 2 months ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"β1,782Updated 3 months ago
- MTEB: Massive Text Embedding Benchmarkβ2,836Updated this week
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.β1,936Updated this week
- Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.β562Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,415Updated last year
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,953Updated 3 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021β1,493Updated 2 years ago