sam-paech / diplobench
Benchmark for LLMs playing full press diplomacy
☆43Updated 2 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- explore token trajectory trees on instruct and base models☆106Updated this week
- ☆38Updated 9 months ago
- ☆46Updated this week
- look how they massacred my boy☆63Updated 7 months ago
- PageRank for LLMs☆41Updated last month
- A distributed agent orchestration framework for market agents☆92Updated 3 weeks ago
- ☆66Updated 11 months ago
- A strongly typed Python DSL for developing message passing multi agent systems☆52Updated last year
- ☆48Updated last year
- Verdict is a library for scaling judge-time compute.☆211Updated 2 weeks ago
- ☆55Updated 2 months ago
- A framework for orchestrating AI agents using a mermaid graph☆75Updated last year
- Chat Markup Language conversation library☆55Updated last year
- Minimal example of MCP for parsing llms.txt☆38Updated last month
- ☆28Updated 7 months ago
- Tools to make language models a bit easier to use☆44Updated 2 weeks ago
- Certified Reasoning with Language Models☆31Updated last year
- ☆21Updated 6 months ago
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- Training code for Sparse Autoencoders on Embedding models☆38Updated 2 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆37Updated last week
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 2 months ago
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆68Updated 3 months ago
- ☆54Updated 3 months ago
- Replace expensive LLM calls with finetunes automatically☆65Updated last year
- smol models are fun too☆92Updated 6 months ago
- ☆97Updated 7 months ago
- Verbosity control for AI agents☆63Updated 11 months ago
- ☆14Updated last month
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna