spinbench/spinbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/spinbench/spinbench)

spinbench / spinbench

☆28

Alternatives and similar repositories for spinbench

Users that are interested in spinbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TextArena / UnstableBaselines
View on GitHub
☆120Apr 7, 2026Updated 3 months ago
mind-games-challenge / mindgames-starter-kit
View on GitHub
The official starter-kit for NeurIPS 2025 mind games competition
☆21May 5, 2026Updated 2 months ago
spiral-rl / spiral
View on GitHub
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
☆199Mar 27, 2026Updated 3 months ago
openverse-ai / MEMO
View on GitHub
MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games
☆28May 10, 2026Updated 2 months ago
TextArena / TextArena
View on GitHub
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆411Jul 3, 2026Updated 2 weeks ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
facebookresearch / decrypto
View on GitHub
Implementation of the Decrypto benchmark for multi-agent reasoning and theory of mind.
☆22Jan 19, 2026Updated 6 months ago
sotopia-lab / sotopia-rl
View on GitHub
Sotopia-RL: Reward Design for Social Intelligence
☆52Apr 1, 2026Updated 3 months ago
thu-nics / MARSHAL
View on GitHub
[ICLR'26] MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
☆54Apr 17, 2026Updated 3 months ago
cmu-mind / RISE
View on GitHub
☆34Oct 31, 2024Updated last year
S-Abdelnabi / LLM-Deliberation
View on GitHub
Code for our NeurIPS'24 Dataset and Benchmark paper: Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiatio…
☆54Nov 11, 2024Updated last year
EmbodiedCity / Open3D-VQA.code
View on GitHub
[ACM MM'25] Code for the paper "Open3D-VQA: A Benchmark for Embodied Spatial Reasoning with Multimodal Large Language Model in Open Space…
☆18Jul 9, 2026Updated last week
facebookresearch / ModelRatatouille
View on GitHub
Recycling diverse models
☆47Jan 18, 2023Updated 3 years ago
Hambaobao / Marathon
View on GitHub
Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.
☆10May 16, 2024Updated 2 years ago
VITA-Group / TAPE
View on GitHub
[ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…
☆15Jun 6, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
mengfeidu / EmbSpatial-Bench
View on GitHub
☆32Jun 24, 2024Updated 2 years ago
resistzzz / Co-rewarding
View on GitHub
[ICLR2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"
☆30Feb 4, 2026Updated 5 months ago
ljcleo / agent_sense
View on GitHub
Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
☆13Jan 4, 2025Updated last year
rescrv / napkin
View on GitHub
Back-of-the-envelope stuffs in Python
☆20Sep 13, 2023Updated 2 years ago
stogiannidis / srbench
View on GitHub
Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"
☆19Feb 1, 2026Updated 5 months ago
zhangir-azerbayev / MetaMath
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
kowndinya-renduchintala / POSIX
View on GitHub
POSIX: A Prompt Sensitivity Index for Language Models
☆13Nov 13, 2024Updated last year
pietrobarbiero / entropy-lens
View on GitHub
☆18Mar 9, 2023Updated 3 years ago
CUHK-ARISE / GAMABench
View on GitHub
Code and data for the paper: Competing Large Language Models in Multi-Agent Gaming Environments
☆98Jan 26, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ziyan-xiaoyu / SpatialMQA
View on GitHub
☆24May 28, 2025Updated last year
conglu1997 / ACD
View on GitHub
Automated Capability Discovery via Foundation Model Self-Exploration
☆68Apr 16, 2026Updated 3 months ago
google-deepmind / game_arena
View on GitHub
☆109Feb 2, 2026Updated 5 months ago
ccs-amsterdam / annotinder-r
View on GitHub
R package for working with the CCS Annotator
☆13Mar 14, 2024Updated 2 years ago
jyhong836 / llm-dp-finetune
View on GitHub
End-to-end codebase for finetuning LLMs (LLaMA 2, 3, etc.) with or without DP
☆17Sep 23, 2024Updated last year
RaghuHemadri / Reinforcement-Learning-Reading-List
View on GitHub
☆11Jul 14, 2021Updated 5 years ago
jinhaoduan / GTBench
View on GitHub
[NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
☆70Sep 6, 2024Updated last year
microsoft / MageBench
View on GitHub
Official Repo for MageBench: Bridging Large Multimodal Models to Agents
☆22Jan 8, 2025Updated last year
vivekmyers / tra-ogbench
View on GitHub
☆18Feb 13, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
YoungDubbyDu / LLM-based-Multi-Agent-Systems
View on GitHub
这是对基于大模型的多智能体系统论文的总结
☆10Jun 23, 2024Updated 2 years ago
VITA-Group / TTC-Net
View on GitHub
[ICML'26] Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning
☆15Jun 1, 2026Updated last month
hrwise-nlp / AppBench
View on GitHub
This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
☆16Nov 4, 2024Updated last year
yunfeixie233 / ViGaL
View on GitHub
☆70Feb 4, 2026Updated 5 months ago
Trae1ounG / Pretrain_Space_RLVR
View on GitHub
[arxiv: 2604.14142] From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
☆17Apr 16, 2026Updated 3 months ago
sharonal10 / langint
View on GitHub
☆10Jul 4, 2024Updated 2 years ago
sentient-agi / werewolf-template
View on GitHub
Template repository for the Werewolf hackathon
☆18Nov 9, 2024Updated last year