MLNLP-World / Reinforcement-Learning-Comic-Notes
通过动画学强化学习笔记
☆51Updated 2 months ago
Alternatives and similar repositories for Reinforcement-Learning-Comic-Notes:
Users that are interested in Reinforcement-Learning-Comic-Notes are comparing it to the libraries listed below
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆62Updated 2 months ago
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆86Updated 7 months ago
- ☆77Updated 3 months ago
- from MHA, MQA, GQA to MLA by 苏剑林, with code☆16Updated 2 months ago
- Reinforcement Learning in LLM and NLP.☆36Updated 3 weeks ago
- ☆21Updated last year
- 大型语言模型实战指南:应用实践与场景落地☆69Updated 7 months ago
- ☆41Updated 3 months ago
- 基于DPO算法微调语言大模型,简单好上手。☆37Updated 10 months ago
- llm & rl☆115Updated this week
- ThinkLLM:🚀 轻量、高效的大语言模型算 法实现☆45Updated this week
- ☆72Updated 7 months ago
- Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"☆80Updated 8 months ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆120Updated 6 months ago
- 解锁HuggingFace生态的百般用法☆90Updated 4 months ago
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆34Updated last month
- LLM101n: Let's build a Storyteller 中文版☆132Updated 8 months ago
- Full stack LLM (Pre-training/finetuning, PPO(RLHF), Inference, Quant, etc.)☆19Updated 2 months ago
- Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...☆71Updated last week
- 通义千问的DPO训练☆47Updated 7 months ago
- ☆64Updated 3 months ago
- 一些 LLM 方面的从零复现笔记☆188Updated last week
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆44Updated last year
- ☆40Updated 9 months ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案☆20Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- ☆42Updated 2 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 2 weeks ago
- The Roadmap for LLMs☆84Updated last year
- 包含程序员面试大厂面试题和面试经验☆129Updated 4 months ago