ai-evals-course / judgyLinks
Python package for estimating a CIs for metrics evaluated by LLM-as-Judges.
☆36Updated 2 months ago
Alternatives and similar repositories for judgy
Users that are interested in judgy are comparing it to the libraries listed below
Sorting:
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated last year
- A framework for evaluating semantic search across custom datasets, metrics, and embedding backends.☆35Updated 2 months ago
- ☆78Updated last year
- Plug-and-play document processing pipelines with zero-shot models.☆86Updated 2 weeks ago
- ☆78Updated 9 months ago
- ☆100Updated this week
- A webhook that integrates the W&B model registry with Modal Labs☆15Updated last year
- ☆54Updated last year
- ☆157Updated 2 weeks ago
- ☆170Updated last year
- A small library of LLM judges☆251Updated 2 weeks ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆32Updated 11 months ago
- Website for Applied-LLMs work☆27Updated 2 months ago
- Production-grade embedding generation, for any length of text, for transformer models.☆23Updated 2 months ago
- Repo for Hamel's Professional Website☆73Updated last week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆102Updated last year
- Official Repository for EvalRS @ KDD 2023: a Rounded Evaluation of Recommender Systems☆30Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated 2 years ago
- A PaaS End-to-End ML Setup with Metaflow, Serverless and SageMaker.☆37Updated 4 years ago
- ☆43Updated 2 years ago
- ☆43Updated 2 years ago
- Template-based generation of DAG cards from Metaflow classes, inspired by Google cards for machine learning models.☆30Updated 3 years ago
- ☆53Updated 3 weeks ago
- ☆155Updated 8 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆102Updated last year
- Codes, scripts, and notebooks on various aspects of transformer models.☆27Updated 2 years ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆113Updated 4 months ago
- Leverage your LangChain trace data for fine tuning☆44Updated last year
- Prompt Engineering Workshop @ AI Convention 2025 (IHK Schwaben)☆63Updated 2 months ago
- Command Line Interface for Hugging Face Inference Endpoints☆66Updated last year