google-deepmind / onetwo
☆185Updated 4 months ago
Alternatives and similar repositories for onetwo:
Users that are interested in onetwo are comparing it to the libraries listed below
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆101Updated 7 months ago
- ☆115Updated this week
- Automatic Evals for Instruction-Tuned Models☆100Updated this week
- Textbook on reinforcement learning from human feedback☆111Updated this week
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆82Updated this week
- Website for hosting the Open Foundation Models Cheat Sheet.☆262Updated 6 months ago
- TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle☆182Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 9 months ago
- Just a bunch of benchmark logs for different LLMs☆116Updated 5 months ago
- Let's build better datasets, together!☆244Updated 3 weeks ago
- ☆163Updated 7 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆278Updated last month
- Draw more samples☆182Updated 6 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆154Updated 2 months ago
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…☆189Updated this week
- ☆108Updated 3 months ago
- ☆49Updated 7 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆247Updated 6 months ago
- Code and Data for Tau-Bench☆253Updated last week
- ☆77Updated 7 months ago
- awesome synthetic (text) datasets☆253Updated 2 months ago
- Automating enterprise workflows with multimodal agents☆97Updated 3 months ago
- ☆147Updated last month
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆54Updated 2 months ago
- Extract full next-token probabilities via language model APIs☆230Updated 10 months ago
- Hugging Face Deep Learning Containers (DLCs) for Google Cloud☆136Updated this week
- A programming framework for agentic AI. Discord: https://discord.gg/pAbnFJrkgZ☆119Updated 2 months ago
- Evaluating LLMs with CommonGen-Lite☆87Updated 9 months ago