zeno-ml / zeno-hubLinks
AI Evaluation Platform
☆46Updated 4 months ago
Alternatives and similar repositories for zeno-hub
Users that are interested in zeno-hub are comparing it to the libraries listed below
Sorting:
- Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 9 months ago
- Chat Markup Language conversation library☆55Updated last year
- Small, simple agent task environments for training and evaluation☆18Updated 11 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆104Updated 3 weeks ago
- An attribution library for LLMs☆42Updated last year
- A framework for evaluating function calls made by LLMs☆38Updated last year
- Just a bunch of benchmark logs for different LLMs☆118Updated last year
- Sphynx Hallucination Induction☆53Updated 8 months ago
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year
- Official Repo for CRMArena and CRMArena-Pro☆118Updated 3 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆64Updated 10 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆44Updated last year
- AI Data Management & Evaluation Platform☆216Updated 2 years ago
- ☆31Updated 10 months ago
- Chrome Extension for exploring Hugging Face datasets 🔎☆48Updated last year
- Open Implementations of LLM Analyses☆107Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 11 months ago
- ReLM is a Regular Expression engine for Language Models☆106Updated 2 years ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 5 months ago
- Python library to use Pleias-RAG models☆63Updated 5 months ago
- ☆49Updated 8 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆123Updated last year
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆54Updated 3 months ago
- ☆46Updated last year
- Leverage your LangChain trace data for fine tuning☆46Updated last year
- Large-language Model Evaluation framework with Elo Leaderboard and A-B testing☆52Updated 11 months ago
- ☆197Updated last year
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆210Updated last week
- Evaluating tool-augmented LLMs in conversation settings☆88Updated last year
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year