zeno-ml / zeno-hubLinks
AI Evaluation Platform
☆46Updated last week
Alternatives and similar repositories for zeno-hub
Users that are interested in zeno-hub are comparing it to the libraries listed below
Sorting:
- Writing Blog Posts with Generative Feedback Loops!☆48Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆33Updated 3 weeks ago
- Small, simple agent task environments for training and evaluation☆18Updated 7 months ago
- ☆29Updated 6 months ago
- ☆41Updated 4 months ago
- Sphynx Hallucination Induction☆54Updated 4 months ago
- ☆58Updated 2 weeks ago
- ☆49Updated 6 months ago
- ReLM is a Regular Expression engine for Language Models☆105Updated 2 years ago
- A framework for evaluating function calls made by LLMs☆37Updated 10 months ago
- ☆38Updated 10 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆56Updated 5 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 10 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated last year
- Pre-training code for CrystalCoder 7B LLM☆54Updated last year
- ☆23Updated last year
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆105Updated 8 months ago
- Functional Benchmarks and the Reasoning Gap☆86Updated 8 months ago
- ☆33Updated 3 months ago
- An attribution library for LLMs☆41Updated 8 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆43Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆65Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆104Updated 5 months ago
- ☆83Updated last month
- AI Data Management & Evaluation Platform☆215Updated last year
- A framework for optimizing DSPy programs with RL☆58Updated this week
- Evaluating LLMs with CommonGen-Lite☆90Updated last year
- ☆43Updated 3 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago