sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆54Updated 5 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- ☆123Updated last year
- A framework for optimizing DSPy programs with RL☆150Updated this week
- An introduction to LLM Sampling☆79Updated 8 months ago
- ☆155Updated 8 months ago
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- ☆157Updated 8 months ago
- look how they massacred my boy☆63Updated 10 months ago
- A framework for orchestrating AI agents using a mermaid graph☆77Updated last year
- Implementation of the board game Codenames, re-imagined as a collaborative game between LLM agents☆109Updated 5 months ago
- explore token trajectory trees on instruct and base models☆133Updated 2 months ago
- Inference-time scaling for LLMs-as-a-judge.☆276Updated last month
- Plotting (entropy, varentropy) for small LMs☆98Updated 3 months ago
- A distributed agent orchestration framework for market agents☆105Updated last week
- ☆112Updated 4 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆448Updated 10 months ago
- Tools to make language models a bit easier to use☆48Updated last month
- ☆56Updated last month
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 6 months ago
- A graph visualization of attention☆57Updated 3 months ago
- smol models are fun too☆92Updated 9 months ago
- Simple UI for debugging correlations of text embeddings☆288Updated 2 months ago
- Train an adapter for any embedding model in under a minute☆112Updated 4 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆88Updated 10 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆45Updated 3 months ago
- PageRank for LLMs☆44Updated 4 months ago
- Verbosity control for AI agents☆65Updated last year
- A user interface for DSPy☆169Updated 2 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆76Updated 8 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 9 months ago
- An automated tool for discovering insights from research papaer corpora☆138Updated last year