GoodStartLabs / AI_DiplomacyLinks
Frontier Models playing the board game Diplomacy.
☆628Updated last month
Alternatives and similar repositories for AI_Diplomacy
Users that are interested in AI_Diplomacy are comparing it to the libraries listed below
Sorting:
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆853Updated this week
- Testing baseline LLMs performance across various models☆336Updated this week
- Async RL Training at Scale☆1,044Updated this week
- An interface library for RL post training with environments.☆1,132Updated this week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆458Updated last year
- [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards☆1,332Updated 3 weeks ago
- ☆483Updated 6 months ago
- Build your own visual reasoning model☆418Updated 3 weeks ago
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆318Updated 7 months ago
- Provider-agnostic, open-source evaluation infrastructure for language models☆719Updated last month
- ☆562Updated 7 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆592Updated last month
- gpt-2 from scratch in mlx☆414Updated last year
- Official CLI and Python SDK for Prime Intellect - access GPU compute, remote sandboxes, RL environments, and distributed training infrast…☆151Updated this week
- Lightly-reviewed collection of community environments☆210Updated 2 weeks ago
- System 2 Reasoning Link Collection☆870Updated 10 months ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆874Updated last week
- Distributed Training Over-The-Internet☆975Updated 3 months ago
- The missing tiktoken training code☆342Updated last month
- Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents☆1,819Updated 5 months ago
- ☆190Updated last year
- ComplexTensor: Machine Learning By Bridging Classical and Quantum Computation☆78Updated last year
- Inference-time scaling for LLMs-as-a-judge.☆328Updated 3 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆496Updated 5 months ago
- ☆313Updated last month
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆829Updated 6 months ago
- This repository allows reproduction of Poetiq's record-breaking submission to the ARC-AGI-1 and ARC-AGI-2 benchmarks.☆1,197Updated last month
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆515Updated 2 months ago
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆940Updated 8 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆1,042Updated 9 months ago