yuanzhoulvpi2017 / nano_rlLinks
在verl上做reward的定制开发
☆54Updated last month
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below
Sorting:
- ☆44Updated 4 months ago
- ☆81Updated last year
- 基于DPO算法微调语言大模型,简单好上手。☆39Updated 11 months ago
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆62Updated 4 months ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆79Updated 7 months ago
- ☆28Updated last year
- ☆141Updated last year
- ☆111Updated 11 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆33Updated 3 weeks ago
- ☆41Updated 10 months ago
- The code and data of DPA-RAG, accepted by WWW 2025 main conference.☆61Updated 5 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆144Updated 6 months ago
- The official implementation of ACL'24 paper: Synergistic Interplay between Search and Large Language Models for Information Retrieval.☆34Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆122Updated 7 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆81Updated last year
- 使用单个24G显卡,从0开始训练LLM☆55Updated last month
- 大模型进阶面经☆52Updated last month
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆125Updated last week
- ☆57Updated 8 months ago
- The demo, code and data of FollowRAG☆73Updated 2 months ago
- Fantastic Data Engineering for Large Language Models☆89Updated 5 months ago
- a-m-team's exploration in large language modeling☆160Updated 3 weeks ago
- 🔍 Awesome Agentic Search is a curated list of papers, tools, and resources on agentic search—where AI agents plan, search, and reason to…☆31Updated last week
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆125Updated 9 months ago
- ☆97Updated last year
- Official repository for "PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning"☆32Updated this week
- A comprehensive collection of process reward models.☆92Updated 2 weeks ago
- 怎么训练一个LLM分词器☆150Updated last year
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 7 months ago
- llm & rl☆151Updated this week