sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆49Updated 3 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- A distributed agent orchestration framework for market agents☆102Updated this week
- ☆38Updated 11 months ago
- look how they massacred my boy☆63Updated 8 months ago
- The history files when recording human interaction while solving ARC tasks☆112Updated 2 weeks ago
- ☆121Updated 10 months ago
- ☆54Updated 4 months ago
- A framework for optimizing DSPy programs with RL☆76Updated this week
- ☆86Updated 5 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆82Updated 8 months ago
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆100Updated last year
- ☆63Updated last month
- An introduction to LLM Sampling☆78Updated 6 months ago
- One click away from a locally downloaded, fine-tuned model, hosted on hugging face, with inference built in. In two hours.☆22Updated 3 months ago
- Scale your LLM-as-a-judge.☆240Updated 2 weeks ago
- A framework for orchestrating AI agents using a mermaid graph☆76Updated last year
- A user interface for DSPy☆160Updated last month
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- Simple UI for debugging correlations of text embeddings☆283Updated 3 weeks ago
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.☆41Updated 4 months ago
- ☆77Updated last year
- PageRank for LLMs☆42Updated 2 months ago
- Easiest way to give context to LLMs; Attachments has the ambition to be the general funnel for any files to be transformed into images+te…☆190Updated last week
- A graph visualization of attention☆56Updated last month
- LMQL implementation of tree of thoughts☆34Updated last year
- Verbosity control for AI agents☆63Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 5 months ago
- ☆47Updated last year
- Implementation of the board game Codenames, re-imagined as a collaborative game between LLM agents☆108Updated 4 months ago