zeno-ml / zeno-hubLinks
AI Evaluation Platform
☆46Updated last month
Alternatives and similar repositories for zeno-hub
Users that are interested in zeno-hub are comparing it to the libraries listed below
Sorting:
- ☆30Updated 7 months ago
- Chat Markup Language conversation library☆55Updated last year
- Small, simple agent task environments for training and evaluation☆18Updated 7 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 6 months ago
- A framework for evaluating function calls made by LLMs☆37Updated 11 months ago
- ☆77Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆48Updated last year
- Leverage your LangChain trace data for fine tuning☆41Updated 10 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆78Updated 2 weeks ago
- Reasoning by Communicating with Agents☆29Updated last month
- ☆47Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆57Updated 6 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆35Updated last month
- Sphynx Hallucination Induction☆54Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 10 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated last year
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆52Updated 3 months ago
- LLM finetuning☆42Updated last year
- ReLM is a Regular Expression engine for Language Models☆106Updated 2 years ago
- Evaluating LLMs with CommonGen-Lite☆90Updated last year
- A specification for OpenInference, a semantic mapping of ML inferences☆47Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆121Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- Python library to use Pleias-RAG models☆57Updated last month
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆65Updated last year
- ☆94Updated 6 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆105Updated 2 weeks ago
- ☆23Updated last year
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆80Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆87Updated 8 months ago