rungalileo / hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
☆108Updated 7 months ago
Alternatives and similar repositories for hallucination-index:
Users that are interested in hallucination-index are comparing it to the libraries listed below
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- This repository implements the chain of verification paper by Meta AI☆168Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- LangChain chat model abstractions for dynamic failover, load balancing, chaos engineering, and more!☆81Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆108Updated 3 weeks ago
- Sample notebooks and prompts for LLM evaluation☆124Updated 2 weeks ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆78Updated 7 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆166Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 4 months ago
- ☆36Updated 9 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 9 months ago
- ☆88Updated last year
- ☆143Updated 9 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆102Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆107Updated 7 months ago
- A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models☆73Updated last month
- Simple examples using Argilla tools to build AI☆52Updated 5 months ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆102Updated last month
- Function Calling Benchmark & Testing☆87Updated 9 months ago
- Open-source RAG evaluation through users' feedback☆182Updated last year
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆87Updated last week
- Writing Blog Posts with Generative Feedback Loops!☆47Updated last year
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.☆105Updated 9 months ago
- ☆60Updated last year
- ☆84Updated last year
- ☆77Updated 11 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆105Updated 3 weeks ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆76Updated 2 months ago