sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆57Updated 8 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- ☆124Updated last year
- ☆66Updated 4 months ago
- Plotting (entropy, varentropy) for small LMs☆99Updated 6 months ago
- A framework for orchestrating AI agents using a mermaid graph☆77Updated last year
- An agent orchestration framework for economic agents☆108Updated 3 months ago
- The history files when recording human interaction while solving ARC tasks☆118Updated 2 weeks ago
- ☆122Updated 2 months ago
- An introduction to LLM Sampling☆79Updated 11 months ago
- PageRank for LLMs☆51Updated 2 months ago
- Implementation of the board game Codenames, re-imagined as a collaborative game between LLM agents☆108Updated 9 months ago
- explore token trajectory trees on instruct and base models☆148Updated 6 months ago
- ☆178Updated 11 months ago
- Simple UI for debugging correlations of text embeddings☆301Updated 6 months ago
- Inference-time scaling for LLMs-as-a-judge.☆312Updated 3 weeks ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆451Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆93Updated last month
- ☆67Updated last year
- look how they massacred my boy☆63Updated last year
- A user interface for DSPy☆196Updated last month
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆88Updated this week
- Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning☆234Updated 9 months ago
- WIP - Allows you to create DSPy pipelines using ComfyUI☆199Updated 11 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆146Updated 9 months ago
- ☆127Updated 11 months ago
- lossily compress representation vectors using product quantization☆59Updated last month
- An automated tool for discovering insights from research papaer corpora☆137Updated last year
- This repository explains and provides examples for "concept anchoring" in GPT4.☆71Updated last year
- ☆92Updated last year
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆315Updated 5 months ago
- Interactive timeline of AI history☆63Updated 2 months ago