anyscale / factuality-evalLinks
Library for iPython notebooks for evaluating factuality.
β51Updated last year
Alternatives and similar repositories for factuality-eval
Users that are interested in factuality-eval are comparing it to the libraries listed below
Sorting:
- π Datasets and models for instruction-tuningβ238Updated last year
- β78Updated last year
- Sample notebooks and prompts for LLM evaluationβ138Updated 2 months ago
- β80Updated 2 years ago
- β207Updated last year
- Notebooks for training universal 0-shot classifiers on many different tasksβ135Updated 7 months ago
- Topic modeling helpers using managed language models from Cohere. Name text clusters using large GPT models.β223Updated 2 years ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.β37Updated last year
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models accessβ¦β114Updated last year
- β167Updated this week
- π Reference-Free automatic summarization evaluation with potential hallucination detectionβ102Updated last year
- Find and fix bugs in natural language machine learning models using adaptive testing.β184Updated last year
- Fast & more realistic evaluation of chat language models. Includes leaderboard.β188Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeedβ35Updated 2 years ago
- β90Updated last year
- Domain Adapted Language Modeling Toolkit - E2E RAGβ327Updated 9 months ago
- Classy-fire is multiclass text classification approach leveraging OpenAI LLM model APIs optimally using clever parameter tuning and prompβ¦β79Updated last year
- β87Updated last year
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.β113Updated 2 weeks ago
- β56Updated last month
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Actβ94Updated last year
- β206Updated last year
- Python package for estimating a CIs for metrics evaluated by LLM-as-Judges.β36Updated 2 months ago
- β47Updated 2 years ago
- AI Data Management & Evaluation Platformβ215Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracyβ102Updated last year
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created byβ¦β32Updated 11 months ago
- Course for Interpreting ML Modelsβ52Updated 2 years ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ114Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β434Updated last year