meituan-longcat / AMO-BenchView external linksLinks
This is the official repo for the paper "AMO-Bench: Large Language Models Still Struggle in High School Math Competitions".
☆62Feb 6, 2026Updated last week
Alternatives and similar repositories for AMO-Bench
Users that are interested in AMO-Bench are comparing it to the libraries listed below
Sorting:
- ☆33Nov 18, 2025Updated 2 months ago
- ☆16Sep 4, 2025Updated 5 months ago
- ☆17May 21, 2025Updated 8 months ago
- ☆24May 23, 2025Updated 8 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆30Jan 12, 2026Updated last month
- ☆41Jan 4, 2026Updated last month
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆24Feb 25, 2025Updated 11 months ago
- ☆15Feb 21, 2024Updated last year
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆54Dec 13, 2025Updated 2 months ago
- REverse-Engineered Reasoning for Open-Ended Generation☆91Sep 10, 2025Updated 5 months ago
- ☆31Sep 12, 2025Updated 5 months ago
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning☆42Nov 11, 2025Updated 3 months ago
- ☆46Jun 24, 2025Updated 7 months ago
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆48Updated this week
- ☆60Jan 12, 2026Updated last month
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆60Dec 20, 2023Updated 2 years ago
- RewardAnything: Generalizable Principle-Following Reward Models☆45Jun 11, 2025Updated 8 months ago
- Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"☆32Apr 12, 2025Updated 10 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Aug 5, 2025Updated 6 months ago
- ☆352Jul 29, 2025Updated 6 months ago
- instruction-following benchmark for large reasoning models☆44Aug 9, 2025Updated 6 months ago
- ThinkGen: Generalized Thinking for Visual Generation☆51Dec 30, 2025Updated last month
- JudgeLRM: Large Reasoning Models as a Judge☆41Jan 29, 2026Updated 2 weeks ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆40May 26, 2025Updated 8 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- ☆11Jun 22, 2025Updated 7 months ago
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 4 months ago
- [ICCV 2025] Official Implementation of Steering Rectified Flow Models in the Vector Field for Controlled Image Generation☆44Jun 27, 2025Updated 7 months ago
- The code repository of UniRL☆51May 30, 2025Updated 8 months ago
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆28Oct 23, 2025Updated 3 months ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 3 months ago
- ☆40Jan 14, 2025Updated last year
- Search Self-Play: Pushing the Frontier of Agent Capability without Supervision☆90Jan 6, 2026Updated last month