chunhuizhang / llm_rl
llm & rl
☆98Updated this week
Alternatives and similar repositories for llm_rl:
Users that are interested in llm_rl are comparing it to the libraries listed below
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆103Updated last week
- Latest Advances on Long Chain-of-Thought Reasoning☆174Updated this week
- ☆123Updated 2 months ago
- ☆356Updated this week
- ☆124Updated 2 weeks ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆128Updated 3 months ago
- ☆110Updated 7 months ago
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆83Updated 7 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆117Updated 5 months ago
- Awesome RL-based LLM Reasoning☆418Updated this week
- ☆77Updated 4 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆69Updated 3 weeks ago
- The related works and background techniques about Openai o1☆218Updated 3 months ago
- A comprehensive collection of process reward models.☆53Updated last week
- A Survey on Multimodal Retrieval-Augmented Generation☆127Updated this week
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆120Updated 5 months ago
- Awesome RL Reasoning Recipes ("Triple R")☆322Updated this week
- ☆513Updated 3 months ago
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆71Updated last week
- The official implementation of Natural Language Fine-Tuning☆48Updated 3 months ago
- ☆54Updated 6 months ago
- ☆74Updated 2 months ago
- ☆64Updated 6 months ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆185Updated this week
- Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...☆69Updated this week
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆170Updated this week
- ☆81Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆160Updated this week
- [ACL'2024 Findings] GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation☆55Updated last year