CMPhysBench / CMPhysBenchLinks
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
☆23Updated last month
Alternatives and similar repositories for CMPhysBench
Users that are interested in CMPhysBench are comparing it to the libraries listed below
Sorting:
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆171Updated 3 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆51Updated last month
- P1: Mastering Physics Olympiads with Reinforcement Learning☆67Updated last month
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 11 months ago
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆164Updated last month
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆70Updated 6 months ago
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated 3 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆50Updated 5 months ago
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆88Updated 6 months ago
- ☆122Updated 3 weeks ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆25Updated 4 months ago
- ☆52Updated 7 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆55Updated 2 months ago
- Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆27Updated 3 months ago
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆24Updated last week
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆48Updated 2 weeks ago
- SSRL: Self-Search Reinforcement Learning☆158Updated 4 months ago
- Geometric-Mean Policy Optimization☆95Updated last month
- TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆363Updated this week
- [NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.☆36Updated last month
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆65Updated 3 months ago
- ☆55Updated 6 months ago
- ☆17Updated 11 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆143Updated 3 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆123Updated 4 months ago
- ☆27Updated 3 months ago
- Resa: Transparent Reasoning Models via SAEs☆46Updated 2 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆115Updated last month
- ☆63Updated last month
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last week