research-outcome / LLM-Game-BenchmarkView external linksLinks
Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard
☆23Dec 14, 2024Updated last year
Alternatives and similar repositories for LLM-Game-Benchmark
Users that are interested in LLM-Game-Benchmark are comparing it to the libraries listed below
Sorting:
- ☆10Feb 9, 2024Updated 2 years ago
- News website template - fully responsive.☆10May 11, 2021Updated 4 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- implementation of Advanced Encryption Standard (AES) Block Cipher☆12Jan 15, 2026Updated 3 weeks ago
- Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering☆10Oct 10, 2021Updated 4 years ago
- Kernel Library Wheel for SGLang☆17Updated this week
- Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"☆18Feb 1, 2026Updated last week
- A record of reading list on some MLsys popular topic☆21Mar 20, 2025Updated 10 months ago
- Some C++/C/CUDA Extension☆16Feb 2, 2022Updated 4 years ago
- An automated data pipeline scaling RL to pretraining levels☆72Oct 11, 2025Updated 4 months ago
- ☆20Jun 9, 2025Updated 8 months ago
- Learning to Model Pixel-Embedded Affinity for Homogeneous Instance Segmentation☆12Jul 16, 2023Updated 2 years ago
- ☆28Updated this week
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated 8 months ago
- (ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"☆25Updated this week
- Using conversational games to evaluate powerful LLMs☆18Sep 3, 2023Updated 2 years ago
- Code for our paper LLaMAR: LM-based Long-Horizon Planner for Multi-Agent Robotics☆28Feb 10, 2025Updated last year
- Evaluation utilities based on SymPy.☆21Dec 12, 2024Updated last year
- ECCV 2024 DTC Dataset Tooling☆22Jan 12, 2026Updated last month
- On demand communication☆32Feb 4, 2026Updated last week
- ☆12Sep 12, 2023Updated 2 years ago
- ☆18Oct 23, 2024Updated last year
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…☆32Jun 21, 2025Updated 7 months ago
- [ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"☆25Feb 21, 2025Updated 11 months ago
- HPC-Lab for High Performance Computing course, 2023 Spring , Tsinghua Universit. 高性能计算导论 @ THU.☆24Jun 13, 2023Updated 2 years ago
- Training DIAMOND to play MarioKart64 in a Neural Network.☆31Sep 9, 2025Updated 5 months ago
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆30Oct 20, 2025Updated 3 months ago
- A Model Context Protocol (MCP) server implementation for Google Calendar integration. Create and manage calendar events directly through …☆33Mar 18, 2025Updated 10 months ago
- ☆39May 20, 2025Updated 8 months ago
- preprocessing tools for multi-modal 3D brain imaging☆30Jan 29, 2026Updated 2 weeks ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆28Mar 14, 2024Updated last year
- Official respository for ReasonGen-R1☆74Jun 23, 2025Updated 7 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated 2 weeks ago
- Recent Advances in Vision-Language Pre-training!☆31Jan 10, 2022Updated 4 years ago
- Tutorial for Ray☆36Mar 31, 2024Updated last year
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆34May 23, 2024Updated last year
- ☆38Feb 27, 2023Updated 2 years ago
- ☆41Jan 9, 2024Updated 2 years ago
- Minimal but scalable implementation of large language models in JAX☆35Nov 28, 2025Updated 2 months ago