patronus-ai / Lynx-hallucination-detection
☆31Updated 7 months ago
Alternatives and similar repositories for Lynx-hallucination-detection:
Users that are interested in Lynx-hallucination-detection are comparing it to the libraries listed below
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆54Updated 5 months ago
- Code for NeurIPS LLM Efficiency Challenge☆55Updated 10 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 7 months ago
- ☆49Updated 2 months ago
- ☆116Updated 4 months ago
- ☆48Updated 3 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆100Updated 2 months ago
- ☆27Updated 3 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆103Updated 8 months ago
- ☆24Updated last year
- Evaluating LLMs with CommonGen-Lite☆88Updated 10 months ago
- Train, tune, and infer Bamba model☆83Updated 3 weeks ago
- ☆55Updated 3 months ago
- ☆87Updated last year
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆40Updated 10 months ago
- ☆59Updated this week
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆102Updated 4 months ago
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆62Updated 8 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆28Updated last week
- Using open source LLMs to build synthetic datasets for direct preference optimization☆57Updated 11 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Set of scripts to finetune LLMs☆36Updated 10 months ago
- ☆48Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆98Updated 5 months ago
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆37Updated 6 months ago
- ☆62Updated 3 weeks ago