rungalileo / hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
☆100Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for hallucination-index
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- ☆75Updated 5 months ago
- Low latency, High Accuracy, Custom Query routers for Co-pilots and Agents. Built by Prithivi Da☆52Updated this week
- Open-source RAG evaluation through users' feedback☆161Updated 7 months ago
- A semantic research engine to get relevant papers based on a user query. Application frontend with Chainlit Copilot. Observability with L…☆76Updated 6 months ago
- ☆88Updated 10 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆97Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task …☆134Updated 2 months ago
- ☆45Updated 7 months ago
- Building a chatbot powered with a RAG pipeline to read,summarize and quote the most relevant papers related to the user query.☆162Updated 6 months ago
- ☆105Updated last month
- Writing Blog Posts with Generative Feedback Loops!☆43Updated 8 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆96Updated 7 months ago
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph☆146Updated 7 months ago
- ☆75Updated 9 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆74Updated 2 months ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆100Updated 9 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆47Updated 10 months ago
- Using LlamaIndex with Ray for productionizing LLM applications☆71Updated last year
- Experimental Code for StructuredRAG: Structured Outputs in Retrieval-Augmented Generation☆94Updated this week
- Generate Tools and Toolkits from any Python SDK -- no extra code required☆49Updated 2 weeks ago
- Chunk your text using gpt4o-mini more accurately☆42Updated 3 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆65Updated 4 months ago
- ☆78Updated this week
- Sample notebooks and prompts for LLM evaluation☆114Updated last week
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.☆65Updated 3 months ago
- ☆12Updated 7 months ago
- Dynamic Metadata based RAG Framework☆71Updated 3 months ago
- Data extraction with LLM on CPU☆111Updated 10 months ago