withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆92Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for routerbench
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated 3 weeks ago
- A simple unified framework for evaluating LLMs☆145Updated last week
- ☆112Updated last month
- ☆46Updated 2 weeks ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆113Updated 5 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆91Updated 3 months ago
- ☆127Updated 3 months ago
- Evaluating LLMs with fewer examples☆134Updated 7 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated last week
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆129Updated this week
- ☆40Updated 2 weeks ago
- Open Implementations of LLM Analyses☆94Updated last month
- Evaluating LLMs with CommonGen-Lite☆85Updated 8 months ago
- ☆49Updated 6 months ago
- ☆93Updated last year
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated last year
- ☆101Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆110Updated 3 weeks ago
- ☆90Updated 4 months ago
- ☆54Updated last month
- Benchmarking LLMs with Challenging Tasks from Real Users☆195Updated 2 weeks ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆87Updated 3 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆161Updated last month
- ☆21Updated last week
- A repository for transformer critique learning and generation☆86Updated 11 months ago
- ☆102Updated last month