sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆53Updated 5 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- ☆123Updated last year
- Plotting (entropy, varentropy) for small LMs☆98Updated 2 months ago
- An introduction to LLM Sampling☆79Updated 7 months ago
- look how they massacred my boy☆63Updated 9 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 5 months ago
- ☆56Updated last month
- A graph visualization of attention☆57Updated 2 months ago
- ☆146Updated 7 months ago
- A framework for orchestrating AI agents using a mermaid graph☆77Updated last year
- smolLM with Entropix sampler on pytorch☆150Updated 9 months ago
- explore token trajectory trees on instruct and base models☆134Updated 2 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆88Updated 10 months ago
- A framework for optimizing DSPy programs with RL☆127Updated this week
- ☆108Updated 4 months ago
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- Interactive timeline of AI history☆58Updated 2 months ago
- Implementation of the board game Codenames, re-imagined as a collaborative game between LLM agents☆109Updated 5 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆45Updated 3 months ago
- ☆66Updated last year
- smol models are fun too☆92Updated 9 months ago
- Claude Deep Research config for Claude Code.☆210Updated 4 months ago
- Train an adapter for any embedding model in under a minute☆110Updated 4 months ago
- The State Of The Art, intelligence☆149Updated last week
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- Inference-time scaling for LLMs-as-a-judge.☆272Updated 3 weeks ago
- A user interface for DSPy☆167Updated 2 months ago
- ☆38Updated last year
- A distributed agent orchestration framework for market agents☆105Updated this week
- Simple Graph Memory for AI applications☆89Updated 2 months ago
- The open-source implementation of Q*, achieved in context as a zero-shot reprogramming of the attention mechanism. (synthetic data)☆1Updated 8 months ago