research-outcome/LLM-Game-Benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/research-outcome/LLM-Game-Benchmark)

research-outcome / LLM-Game-Benchmark

Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard

☆23

Alternatives and similar repositories for LLM-Game-Benchmark

Users that are interested in LLM-Game-Benchmark are comparing it to the libraries listed below

Sorting:

Joshuaclymer / GameBench
View on GitHub
☆21Jun 27, 2024Updated last year
frivizn / news-website-template
View on GitHub
News website template - fully responsive.
☆10May 11, 2021Updated 4 years ago
BiEchi / DistributedTrainingGPT2
View on GitHub
基于PyTorch GPT-2的针对各种数据并行pretrain的研究代码.
☆11Dec 16, 2022Updated 3 years ago
MeiqiGuo / AKBC2021-Abg-CoQA
View on GitHub
Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
☆10Oct 10, 2021Updated 4 years ago
bonniesjli / icm
View on GitHub
Intrinsic Curiosity Module (ICM) + PPO on the Pyramid and PushBlock environment.
☆12Sep 3, 2019Updated 6 years ago
donggeunkwon / aes
View on GitHub
implementation of Advanced Encryption Standard (AES) Block Cipher
☆12Jan 15, 2026Updated last month
sgl-project / whl
View on GitHub
Kernel Library Wheel for SGLang
☆16Updated this week
H-Huang / torch_collective_extension
View on GitHub
A minimum demo for PyTorch distributed extension functionality for collectives.
☆15Jul 29, 2024Updated last year
yichuan-w / MLsys_reading_list
View on GitHub
A record of reading list on some MLsys popular topic
☆22Mar 20, 2025Updated 11 months ago
CornellHPC / HySortK
View on GitHub
High Performance Sorting Based Distributed memory K-mer counter
☆15Dec 8, 2025Updated 2 months ago
HsLOL / ExtensionOPs
View on GitHub
Some C++/C/CUDA Extension
☆16Feb 2, 2022Updated 4 years ago
sdan / continualcode
View on GitHub
pip install continualcode
☆34Feb 10, 2026Updated 3 weeks ago
SalesforceAIResearch / PretrainRL-pipeline
View on GitHub
An automated data pipeline scaling RL to pretraining levels
☆73Oct 11, 2025Updated 4 months ago
zrt / thu-xiyi
View on GitHub
清华大学宿舍洗衣机空闲提醒小程序
☆14Feb 4, 2021Updated 5 years ago
spinbench / spinbench
View on GitHub
☆28Feb 13, 2026Updated 3 weeks ago
maitrix-org / de-arena
View on GitHub
Official repository for Decentralized Arena via Collective LLM Intelligence
☆17May 19, 2025Updated 9 months ago
jordddan / GameEval
View on GitHub
Using conversational games to evaluate powerful LLMs
☆18Sep 3, 2023Updated 2 years ago
WangHanLinHenry / STeCa
View on GitHub
(ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"
☆25Updated this week
tongyx361 / symeval
View on GitHub
Evaluation utilities based on SymPy.
☆21Dec 12, 2024Updated last year
microsoft / MageBench
View on GitHub
Official Repo for MageBench: Bridging Large Multimodal Models to Agents
☆22Jan 8, 2025Updated last year
Danielhp95 / gym-kuhn-poker
View on GitHub
Kuhn poker implemented in accordance to OpenAI gym interface
☆14Dec 5, 2019Updated 6 years ago
sail-sg / odc
View on GitHub
On demand communication
☆32Feb 26, 2026Updated last week
YazhouZhu19 / Partition-A-Medical-Image
View on GitHub
[IEEE TIM 2024] Partition A Medical Image: Extracting Multiple Representative Sub-Regions for Few-shot Medical Image Segmentation
☆18Oct 10, 2024Updated last year
mignonjia / TS_watermark
View on GitHub
☆17May 11, 2025Updated 9 months ago
nsidn98 / LLaMAR
View on GitHub
Code for our paper LLaMAR: LM-based Long-Horizon Planner for Multi-Agent Robotics
☆30Feb 10, 2025Updated last year
ACMClassCourses / Arch2022-Notes
View on GitHub
☆12Sep 12, 2023Updated 2 years ago
eric-ai-lab / ProbMed
View on GitHub
[ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
☆25Feb 21, 2025Updated last year
why-in-Shanghaitech / pj-hf
View on GitHub
CS274A NLP project about HuggingFace transformers. Student code release.
☆21May 7, 2025Updated 9 months ago
Twilight92z / Quantize-Watermark
View on GitHub
☆19Nov 6, 2023Updated 2 years ago
iscas3dv / Two-Hand-Shape-Pose_v2
View on GitHub
Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)
☆21Feb 10, 2023Updated 3 years ago
2horse9sun / ucb_sp20_cs162_hw
View on GitHub
UC Berkeley CS162 Operating System and System Programming Homework
☆19Aug 23, 2020Updated 5 years ago
Dere-Wah / AI-MarioKart64
View on GitHub
Training DIAMOND to play MarioKart64 in a Neural Network.
☆30Sep 9, 2025Updated 5 months ago
fengerwoo / CCGate
View on GitHub
Claude code 镜像 / Claude API 的二次分发反向代理服务器，可以分发为多个key，同时转换给CC或者任何Anthropic/OpenAI API兼容应用使用
☆40Sep 1, 2025Updated 6 months ago
jiwonsong-dev / ReasoningPathCompression
View on GitHub
[NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆30Oct 20, 2025Updated 4 months ago
EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28May 23, 2024Updated last year
Parallel-Reasoning / APR
View on GitHub
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆142Dec 17, 2025Updated 2 months ago
thu-ml / MLA-Trust
View on GitHub
A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions thro…
☆64Jan 9, 2026Updated last month
SJTU-DENG-Lab / UniCMs
View on GitHub
☆39May 20, 2025Updated 9 months ago
Reason-Wang / NAT
View on GitHub
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆28Mar 14, 2024Updated last year