patronus-ai / Lynx-hallucination-detectionLinks

☆41

Alternatives and similar repositories for Lynx-hallucination-detection

Users that are interested in Lynx-hallucination-detection are comparing it to the libraries listed below

Sorting:

salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated 10 months ago
daniel-furman / sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.
☆77Updated 9 months ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆105Updated 7 months ago
SALT-NLP / demonstrated-feedback
☆125Updated 10 months ago
wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆114Updated 10 months ago
aymeric-roucher / agent_reasoning_benchmark
🔧 Compare how Agent systems perform on several benchmarks. 📊🚀
☆99Updated 9 months ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
zetaalphavector / RAGElo
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
☆114Updated 3 weeks ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆71Updated last year
mungg / FABLES
☆57Updated 10 months ago
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆160Updated last year
neulab / ragged
Retrieval Augmented Generation Generalized Evaluation Dataset
☆54Updated 2 weeks ago
Upaya07 / NeurIPS-llm-efficiency-challenge
Code for NeurIPS LLM Efficiency Challenge
☆59Updated last year
austrian-code-wizard / c3po
☆29Updated this week
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆242Updated 8 months ago
allenai / catwalk
This project studies the performance and robustness of language models and task-adaptation methods.
☆150Updated last year
hyintell / RetrievalQA
Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…
☆66Updated last year
yueyu1030 / AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
☆152Updated last year
pacman100 / peft-codegen-25
☆23Updated 2 years ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
writer / writing-in-the-margins
☆118Updated 11 months ago
LLM360 / Analysis360
Open Implementations of LLM Analyses
☆105Updated 9 months ago
h2oai / h2o-LLM-eval
Large-language Model Evaluation framework with Elo Leaderboard and A-B testing
☆52Updated 9 months ago
arcee-ai / DAM
☆53Updated 8 months ago
huggingface / data-is-better-together
Let's build better datasets, together!
☆260Updated 7 months ago
apple / ml-superposition-prompting
☆145Updated last year
huggingface / lm-evaluation-harness
A framework for few-shot evaluation of language models.
☆34Updated 4 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆268Updated last year
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year