huggingface / evaluateLinks
π€ Evaluate: A library for easily evaluating machine learning models and datasets.
β2,359Updated this week
Alternatives and similar repositories for evaluate
Users that are interested in evaluate are comparing it to the libraries listed below
Sorting:
- β1,551Updated 2 weeks ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,164Updated 2 weeks ago
- A Unified Library for Parameter-Efficient and Modular Transfer Learningβ2,783Updated last month
- Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language modelsβ3,144Updated last year
- A modular RL library to fine-tune language models to human preferencesβ2,366Updated last year
- β1,250Updated last year
- Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models β¦β2,539Updated this week
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,006Updated last year
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,728Updated last year
- β2,907Updated this week
- Model explainability that works seamlessly with π€ transformers. Explain your transformers model in just 2 lines of code.β1,392Updated 2 years ago
- General technology for enabling AI capabilities w/ LLMs and MLLMsβ4,175Updated this week
- Toolkit for creating, sharing and using natural language prompts.β2,967Updated 2 years ago
- Efficient few-shot learning with Sentence Transformersβ2,598Updated 3 months ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,289Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,726Updated this week
- PyTorch extensions for high performance and large scale training.β3,385Updated 6 months ago
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,671Updated 5 months ago
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,996Updated last month
- The implementation of DeBERTaβ2,164Updated 2 years ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.β1,978Updated this week
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"β1,796Updated 5 months ago
- MTEB: Massive Text Embedding Benchmarkβ2,964Updated this week
- BERT score for text generationβ1,838Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ2,108Updated this week
- Original Implementation of Prompt Tuning from Lester, et al, 2021β696Updated 8 months ago
- SGPT: GPT Sentence Embeddings for Semantic Searchβ873Updated last year
- Measuring Massive Multitask Language Understanding | ICLR 2021β1,514Updated 2 years ago
- Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining theβ¦β2,062Updated last year
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.β1,843Updated 2 years ago