sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆56Updated 8 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- PageRank for LLMs☆51Updated 2 months ago
- ☆62Updated 3 months ago
- An introduction to LLM Sampling☆79Updated 10 months ago
- ☆124Updated last year
- explore token trajectory trees on instruct and base models☆148Updated 5 months ago
- A framework for orchestrating AI agents using a mermaid graph☆77Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆145Updated 8 months ago
- look how they massacred my boy☆63Updated last year
- Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words☆156Updated 3 weeks ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated last month
- An agent orchestration framework for economic agents☆105Updated 2 months ago
- Inference-time scaling for LLMs-as-a-judge.☆307Updated last month
- Simple UI for debugging correlations of text embeddings☆298Updated 5 months ago
- ☆171Updated 10 months ago
- WIP - Allows you to create DSPy pipelines using ComfyUI☆198Updated 11 months ago
- ☆159Updated 11 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆92Updated last month
- A framework for optimizing DSPy programs with RL☆273Updated this week
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆102Updated last year
- ☆67Updated last year
- Plotting (entropy, varentropy) for small LMs☆98Updated 5 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆81Updated 8 months ago
- ☆80Updated last year
- Super basic implementation (gist-like) of RLMs with REPL environments.☆242Updated 3 weeks ago
- ☆121Updated last month
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 9 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆88Updated this week
- SoTA Approach for ARC-AGI 2☆128Updated last month
- ☆116Updated 10 months ago
- Train your own SOTA deductive reasoning model☆109Updated 8 months ago