changyeyu / LLM-RL-VisualizedLinks
LLM, RL, DPO, SFT, Distillation, Alignment. 由《大模型算法》作者发起(By the author of the book📘 "Large Model Algorithms")
☆55Updated last month
Alternatives and similar repositories for LLM-RL-Visualized
Users that are interested in LLM-RL-Visualized are comparing it to the libraries listed below
Sorting:
- ☆83Updated 8 months ago
- llm & rl☆151Updated this week
- ☆15Updated 7 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆84Updated 3 months ago
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆277Updated 7 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆186Updated 3 months ago
- Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...☆73Updated last month
- ☆337Updated 4 months ago
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆64Updated 4 months ago
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆371Updated last year
- A comprehensive collection of process reward models.☆92Updated 2 weeks ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆125Updated last week
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆169Updated last year
- 本项目是自动化学报中AUTOPLAN的代码地址,使用大语言模型完成了复杂任务的任务规划以及任务执行☆102Updated 7 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆130Updated 2 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆127Updated this week
- ☆242Updated last month
- ☆242Updated 3 weeks ago
- A curated list of RL resources☆40Updated last year
- ☆40Updated last week
- ☆241Updated 2 weeks ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆206Updated last year
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models☆42Updated 2 months ago
- ICLR 2025 Agent-Related Papers☆70Updated 7 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆185Updated last year
- ☆145Updated 5 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆366Updated 6 months ago
- 💩里淘金☆19Updated this week
- 《多模态大模型:新一代人工智能技术范式》作者:刘阳,林倞☆216Updated 6 months ago
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆164Updated 2 weeks ago