sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆50Updated 4 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- A framework for optimizing DSPy programs with RL☆91Updated this week
- ☆56Updated last week
- A framework for orchestrating AI agents using a mermaid graph☆77Updated last year
- ☆122Updated 11 months ago
- Inference-time scaling for LLMs-as-a-judge.☆251Updated this week
- An introduction to LLM Sampling☆79Updated 7 months ago
- A distributed agent orchestration framework for market agents☆102Updated 3 weeks ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆85Updated 9 months ago
- Simple Graph Memory for AI applications☆88Updated 2 months ago
- look how they massacred my boy☆63Updated 9 months ago
- ☆154Updated 7 months ago
- Plotting (entropy, varentropy) for small LMs☆97Updated 2 months ago
- ☆77Updated last year
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆42Updated 2 months ago
- ☆106Updated 3 months ago
- A user interface for DSPy☆162Updated last month
- Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their…☆150Updated 10 months ago
- PageRank for LLMs☆43Updated 3 months ago
- Train an adapter for any embedding model in under a minute☆106Updated 3 months ago
- The history files when recording human interaction while solving ARC tasks☆113Updated this week
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 5 months ago
- Tools to make language models a bit easier to use☆48Updated 2 weeks ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆101Updated last year
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- ☆132Updated 7 months ago
- A graph visualization of attention☆56Updated 2 months ago
- ☆64Updated last month
- Implementation of the board game Codenames, re-imagined as a collaborative game between LLM agents☆108Updated 4 months ago
- Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning☆225Updated 4 months ago
- Train your own SOTA deductive reasoning model☆99Updated 4 months ago