ai-evals-course / judgyLinks
Python package for estimating a CIs for metrics evaluated by LLM-as-Judges.
☆22Updated last month
Alternatives and similar repositories for judgy
Users that are interested in judgy are comparing it to the libraries listed below
Sorting:
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated last year
- ☆35Updated last month
- ☆72Updated 7 months ago
- ☆41Updated last year
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated last year
- Framework for building and maintaining self-updating prompts for LLMs☆63Updated last year
- ☆78Updated last year
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆31Updated 10 months ago
- ☆39Updated 2 months ago
- Using various instructor clients evaluating the quality and capabilities of extractions and reasoning.☆52Updated 9 months ago
- Prototyping a question and answer bot over PDFs☆39Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆48Updated last year
- Jupyter Notebooks and an R Notebook for encoding Pokémon embeddings and creating data visualizations.☆19Updated last year
- Creating Generative AI Apps which work☆17Updated 2 months ago
- ☆8Updated 11 months ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- Fork of https://github.com/o19s/elasticsearch-learning-to-rank to work with OpenSearch☆17Updated this week
- A framework for evaluating semantic search across custom datasets, metrics, and embedding backends.☆34Updated last month
- Plug-and-play document processing pipelines with zero-shot models.☆69Updated last month
- Just another sentiment wrapper.☆17Updated 3 years ago
- Iterate fast on your RAG pipelines☆23Updated last week
- Production-grade embedding generation, for any length of text, for transformer models.☆23Updated 2 weeks ago
- A library to use `modal` as a backend for `joblib`.☆29Updated 5 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆100Updated last year
- ☆30Updated 7 months ago
- A framework for evaluating function calls made by LLMs☆37Updated 11 months ago
- A personal knowledge base that I can dump information to and help me learn☆24Updated last month
- Small python package to measure OCR quality and other related metrics.☆23Updated last year
- ☆43Updated 2 years ago
- ☆70Updated last week