prometheus-eval / prometheus-evalLinks

Evaluate your LLM's response with Prometheus and GPT4 💯

☆978

Alternatives and similar repositories for prometheus-eval

Users that are interested in prometheus-eval are comparing it to the libraries listed below

Sorting:

mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆649Updated last year
stanford-futuredata / ARES
Automated Evaluation of RAG Systems
☆637Updated 4 months ago
BatsResearch / bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
☆783Updated 3 weeks ago
ContextualAI / gritlm
Generative Representational Instruction Tuning
☆664Updated last month
huggingface / cosmopedia
☆529Updated 8 months ago
huggingface / lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆1,793Updated this week
xfactlab / orpo
Official repository for ORPO
☆461Updated last year
potsawee / selfcheckgpt
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
☆549Updated last year
datadreamer-dev / DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
☆1,041Updated 6 months ago
stanfordnlp / pyreft
Stanford NLP Python library for Representation Finetuning (ReFT)
☆1,500Updated 6 months ago
argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆2,833Updated last week
predibase / llm_distillation_playbook
Best practices for distilling large language models.
☆569Updated last year
EdinburghNLP / awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
☆930Updated last month
castorini / rank_llm
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
☆508Updated this week
chujiezheng / chat_templates
Chat Templates for 🤗 HuggingFace Large Language Models
☆690Updated 7 months ago
magpie-align / magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆744Updated 4 months ago
google-deepmind / long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
☆627Updated 3 weeks ago
huggingface / text-clustering
Easily embed, cluster and semantically label text datasets
☆560Updated last year
trotsky1997 / MathBlackBox
☆1,028Updated 7 months ago
SalesforceAIResearch / xLAM
xLAM: A Family of Large Action Models to Empower AI Agent Systems
☆513Updated this week
Leeroo-AI / mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
☆488Updated 11 months ago
madaan / self-refine
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
☆719Updated 10 months ago
ezelikman / quiet-star
Code for Quiet-STaR
☆737Updated 11 months ago
PAIR-code / llm-comparator
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…
☆464Updated 5 months ago
prometheus-eval / prometheus
[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically d…
☆303Updated last year
xhluca / bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
☆1,267Updated 2 months ago
tencent-ailab / persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
☆1,265Updated 5 months ago
ContextualAI / HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
☆877Updated 3 weeks ago
tianyi-lab / Reflection_Tuning
[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
☆360Updated 11 months ago
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,516Updated this week