cnsdqd-dyb / Guide-GRPOLinks
Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.
☆30Updated 3 months ago
Alternatives and similar repositories for Guide-GRPO
Users that are interested in Guide-GRPO are comparing it to the libraries listed below
Sorting:
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated 2 weeks ago
- ☆111Updated last week
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆44Updated 4 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 4 months ago
- ICLR2024 statistics☆47Updated last year
- [ICLR'25] PiCO: Peer Review in LLMs based on the Consistency Optimization, https://arxiv.org/pdf/2402.01830☆36Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 3 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆44Updated 3 weeks ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆74Updated 8 months ago
- Assessing Context-Aware Creative Intelligence in MLLMs☆19Updated 2 months ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆50Updated last week
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆57Updated 7 months ago
- ☆46Updated 2 months ago
- (VillagerAgent ACL 2024) A Graph based Minecraft multi agents framework☆64Updated 2 weeks ago
- The official repository for the Scientific Paper Idea Proposer (SciPIP)☆62Updated 3 months ago
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆44Updated 6 months ago
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆36Updated last year
- Empowering Unified MLLM with Multi-granular Visual Generation☆124Updated 4 months ago
- ☆17Updated 7 months ago
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆44Updated this week
- ☆48Updated 2 months ago
- Official Repository of LatentSeek☆30Updated 2 weeks ago
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆16Updated 7 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆35Updated 2 months ago
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆38Updated last year
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆34Updated last month
- ☆105Updated 2 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆47Updated last week
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆34Updated 2 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year