neulab / VisualPuzzlesLinks
☆18Updated 2 months ago
Alternatives and similar repositories for VisualPuzzles
Users that are interested in VisualPuzzles are comparing it to the libraries listed below
Sorting:
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆59Updated this week
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆95Updated 3 months ago
- ☆21Updated 9 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆70Updated 6 months ago
- Resources for the Enigmata Project.☆77Updated 5 months ago
- A repo for open research on building large reasoning models☆136Updated 2 weeks ago
- A Self-Training Framework for Vision-Language Reasoning☆88Updated last year
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆27Updated last year
- Towards a Unified View of Large Language Model Post-Training☆201Updated 5 months ago
- ☆43Updated 5 months ago
- ☆64Updated last week
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆148Updated 8 months ago
- ☆51Updated 9 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Updated last year
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆63Updated 7 months ago
- ☆72Updated 8 months ago
- ☆56Updated 11 months ago
- ☆135Updated 2 weeks ago
- ☆352Updated 6 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆26Updated 6 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆158Updated 7 months ago
- [ICML 2025] Official Implementation of GLIDER☆72Updated 4 months ago
- ☆47Updated 10 months ago
- ☆13Updated last year
- ☆108Updated 2 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆84Updated 3 months ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆32Updated 6 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109Updated 8 months ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆62Updated 4 months ago
- ☆51Updated last year