ljc010717 / GRPO2025Links
☆23Updated 6 months ago
Alternatives and similar repositories for GRPO2025
Users that are interested in GRPO2025 are comparing it to the libraries listed below
Sorting:
- A live reading list for LLM data synthesis (Updated to July, 2025).☆399Updated 2 months ago
- ☆166Updated last year
- llm & rl☆236Updated last week
- 在verl上做reward的定制开发☆122Updated 5 months ago
- 对llama3进行全参微调、lora微调以及qlora微调。☆210Updated last year
- RAG 论文学习☆177Updated 7 months ago
- A Survey on Multimodal Retrieval-Augmented Generation☆398Updated 2 weeks ago
- ☆102Updated 4 months ago
- kaggle 2024 Eedi 第10名 金牌方案☆43Updated 10 months ago
- 该仓库主要记录 LLMs 算法工程师相关的顶会论文研读笔记(多模态、PEFT、小样本QA问答、RAG、LMMs可解释性、Agents、CoT)☆362Updated last year
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆848Updated 3 months ago
- Reinforcement Learning in LLM and NLP.☆61Updated last month
- personal chatgpt☆388Updated 10 months ago
- An Awesome List of Agentic Model trained with Reinforcement Learning☆527Updated 2 weeks ago
- TinyRAG☆355Updated 4 months ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆127Updated 11 months ago
- ☆57Updated last year
- ☆367Updated 2 weeks ago
- 《EasyOffer》(<大模型面经合集>)是针对LLM宝宝们量身打造的大模型暑期实习Offer指南,主要记录大模型暑期实习和秋招准备的一些常见大厂手撕代码、大厂面经经验、常见大厂思考题等;小白一个,正在学习ing......有问题各位大佬随时指正,希望大家都能拿到心仪Of…☆534Updated 7 months ago
- ☆548Updated 10 months ago
- This repository collects awesome survey, resource, and paper for Lifelong Learning for Large Language Models. (Updated Regularly)☆67Updated 5 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆537Updated 3 months ago
- 大语言模型应用:RAG、NL2SQL、聊天机器人、预训练、MOE混合专家模型、微调训练、强化学习、天池数据竞赛☆71Updated 8 months ago
- ☆414Updated 3 weeks ago
- 欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓☆888Updated this week
- HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization☆14Updated 5 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆398Updated 4 months ago
- ☆65Updated 5 months ago
- ☆27Updated 3 months ago
- A One-Stop Reward Model Platform☆76Updated this week