Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard
☆23Dec 14, 2024Updated last year
Alternatives and similar repositories for LLM-Game-Benchmark
Users that are interested in LLM-Game-Benchmark are comparing it to the libraries listed below
Sorting:
- ☆21Jun 27, 2024Updated last year
- News website template - fully responsive.☆10May 11, 2021Updated 4 years ago
- 基于PyTorch GPT-2的针对各种数据并行pretrain的研究代码.☆11Dec 16, 2022Updated 3 years ago
- Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering☆10Oct 10, 2021Updated 4 years ago
- Intrinsic Curiosity Module (ICM) + PPO on the Pyramid and PushBlock environment.☆12Sep 3, 2019Updated 6 years ago
- implementation of Advanced Encryption Standard (AES) Block Cipher☆12Jan 15, 2026Updated last month
- Kernel Library Wheel for SGLang☆16Updated this week
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Jul 29, 2024Updated last year
- A record of reading list on some MLsys popular topic☆22Mar 20, 2025Updated 11 months ago
- High Performance Sorting Based Distributed memory K-mer counter☆15Dec 8, 2025Updated 2 months ago
- Some C++/C/CUDA Extension☆16Feb 2, 2022Updated 4 years ago
- pip install continualcode☆34Feb 10, 2026Updated 3 weeks ago
- An automated data pipeline scaling RL to pretraining levels☆73Oct 11, 2025Updated 4 months ago
- 清华大学宿舍洗衣机空闲提醒小程序☆14Feb 4, 2021Updated 5 years ago
- ☆28Feb 13, 2026Updated 3 weeks ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated 9 months ago
- Using conversational games to evaluate powerful LLMs☆18Sep 3, 2023Updated 2 years ago
- (ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"☆25Updated this week
- Evaluation utilities based on SymPy.☆21Dec 12, 2024Updated last year
- Official Repo for MageBench: Bridging Large Multimodal Models to Agents☆22Jan 8, 2025Updated last year
- Kuhn poker implemented in accordance to OpenAI gym interface☆14Dec 5, 2019Updated 6 years ago
- On demand communication☆32Feb 26, 2026Updated last week
- [IEEE TIM 2024] Partition A Medical Image: Extracting Multiple Representative Sub-Regions for Few-shot Medical Image Segmentation☆18Oct 10, 2024Updated last year
- ☆17May 11, 2025Updated 9 months ago
- Code for our paper LLaMAR: LM-based Long-Horizon Planner for Multi-Agent Robotics☆30Feb 10, 2025Updated last year
- ☆12Sep 12, 2023Updated 2 years ago
- [ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"☆25Feb 21, 2025Updated last year
- CS274A NLP project about HuggingFace transformers. Student code release.☆21May 7, 2025Updated 9 months ago
- ☆19Nov 6, 2023Updated 2 years ago
- Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)☆21Feb 10, 2023Updated 3 years ago
- UC Berkeley CS162 Operating System and System Programming Homework☆19Aug 23, 2020Updated 5 years ago
- Training DIAMOND to play MarioKart64 in a Neural Network.☆30Sep 9, 2025Updated 5 months ago
- Claude code 镜像 / Claude API 的二次分发反向代理服务器,可以分发为多个key,同时转换给CC或者任何Anthropic/OpenAI API兼容应用使用☆40Sep 1, 2025Updated 6 months ago
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆30Oct 20, 2025Updated 4 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28May 23, 2024Updated last year
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆142Dec 17, 2025Updated 2 months ago
- A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions thro…☆64Jan 9, 2026Updated last month
- ☆39May 20, 2025Updated 9 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆28Mar 14, 2024Updated last year