sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆56Updated 10 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- ☆67Updated 6 months ago
- ☆125Updated last year
- Inference-time scaling for LLMs-as-a-judge.☆327Updated 2 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Updated 3 weeks ago
- Train your own SOTA deductive reasoning model☆107Updated 10 months ago
- ☆190Updated last year
- Simple UI for debugging correlations of text embeddings☆305Updated 8 months ago
- ☆161Updated last year
- An introduction to LLM Sampling☆79Updated last year
- A framework for optimizing DSPy programs with RL☆309Updated 2 weeks ago
- Plotting (entropy, varentropy) for small LMs☆99Updated 8 months ago
- A framework for orchestrating AI agents using a mermaid graph☆76Updated last year
- PageRank for LLMs☆52Updated 4 months ago
- A strongly typed Python DSL for developing message passing multi agent systems☆53Updated last year
- An agent orchestration framework for economic agents☆112Updated 5 months ago
- Use the OpenAI Batch tool to make async batch requests to the OpenAI API.☆101Updated last year
- ☆134Updated last year
- look how they massacred my boy☆63Updated last year
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆88Updated 11 months ago
- SoTA Approach for ARC-AGI 2☆158Updated 4 months ago
- ⚖️ Awesome LLM Judges ⚖️☆148Updated 9 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆457Updated last year
- explore token trajectory trees on instruct and base models☆150Updated 8 months ago
- ☆68Updated 8 months ago
- smolLM with Entropix sampler on pytorch☆149Updated last year
- Implementation of the board game Codenames, re-imagined as a collaborative game between LLM agents☆108Updated 11 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆65Updated 8 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆99Updated 3 months ago
- Train an adapter for any embedding model in under a minute☆130Updated 9 months ago
- An automated tool for discovering insights from research papaer corpora☆137Updated last year