ai-evals-course / judgyLinks
Python package for estimating a CIs for metrics evaluated by LLM-as-Judges.
☆74Updated 5 months ago
Alternatives and similar repositories for judgy
Users that are interested in judgy are comparing it to the libraries listed below
Sorting:
- Easiest way to give context to LLMs; Attachments has the ambition to be the general funnel for any files to be transformed into images+te…☆321Updated 2 months ago
- ☆84Updated last year
- ☆82Updated 2 months ago
- A comprehensive 0-to-1 guide for building self-improving LLM applications with DSPy framework☆193Updated last month
- Simple UI for debugging correlations of text embeddings☆299Updated 5 months ago
- Extract structured data from any content using LLMs.☆55Updated this week
- Minimal agent runtime built with DSPy modules and a thin Python loop. Includes CLI, FastAPI server, and eval harness with OpenAI/Ollama s…☆64Updated 2 months ago
- A Lightweight Library for AI Observability☆251Updated 8 months ago
- Deep Research for your internal data☆348Updated 5 months ago
- A small library of LLM judges☆302Updated 3 months ago
- Plug-and-play, zero-shot document processing pipelines.☆113Updated this week
- Get a markdown version of any webpage with a keyboard shortcut.☆67Updated 9 months ago
- ☆36Updated 6 months ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆117Updated 7 months ago
- Using various instructor clients evaluating the quality and capabilities of extractions and reasoning.☆52Updated last year
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆362Updated 2 months ago
- ☆77Updated last year
- ☆74Updated last year
- Dynamic Metadata based RAG Framework☆78Updated last year
- Train embedding and reranker models for retrieval tasks on Apple Silicon with MLX☆167Updated 2 months ago
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆116Updated 4 months ago
- This repo is the central repo for all the RAG Evaluation reference material and partner workshop☆76Updated 6 months ago
- ☆53Updated 7 months ago
- Inference-time scaling for LLMs-as-a-judge.☆310Updated 2 weeks ago
- A reimplementation of langgraph's customer support example in Rasa's CALM paradigm and a quantiative evaluation of the 2 approaches☆81Updated 7 months ago
- Open-source versioning, tracing, and annotation tooling.☆204Updated last week
- This codebase demonstrates various DSPy functionalities through practical examples.☆53Updated 9 months ago
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- Vibe-coding tools for the LlamaIndex ecosystem☆173Updated 2 weeks ago
- Prompt design in Python☆63Updated 11 months ago