sam-paech / diplobench
Benchmark for LLMs playing full press diplomacy
☆39Updated 3 weeks ago
Alternatives and similar repositories for diplobench:
Users that are interested in diplobench are comparing it to the libraries listed below
- Verdict is a library for scaling judge-time compute.☆190Updated 2 weeks ago
- look how they massacred my boy☆63Updated 5 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆101Updated this week
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated last month
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆68Updated last month
- Chat Markup Language conversation library☆55Updated last year
- ☆20Updated 4 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆74Updated 2 weeks ago
- Simple GRPO scripts and configurations.☆59Updated last month
- ☆38Updated 8 months ago
- Tools to make language models a bit easier to use☆39Updated this week
- ☆55Updated 3 weeks ago
- A strongly typed Python DSL for developing message passing multi agent systems☆52Updated 11 months ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆31Updated last month
- A distributed agent orchestration framework for market agents☆84Updated last week
- Verbosity control for AI agents☆60Updated 10 months ago
- ☆80Updated 2 months ago
- Interactive timeline of AI history☆45Updated this week
- An introduction to LLM Sampling☆77Updated 3 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆45Updated last month
- A framework for orchestrating AI agents using a mermaid graph☆75Updated 10 months ago
- ☆76Updated 9 months ago
- ☆51Updated last month
- Very minimal (and stateless) agent framework☆41Updated 2 months ago
- Conduct in-depth research with AI-driven insights : DeepDive is a command-line tool that leverages web searches and AI models to generate…☆39Updated 7 months ago
- ☆97Updated 5 months ago
- ☆48Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 2 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆76Updated 6 months ago