在verl上做reward的定制开发
☆145May 22, 2025Updated 10 months ago
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TBD☆50Mar 13, 2026Updated last week
- [AAAI 2025] The official code of the paper "InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct"(http…☆14Jul 10, 2024Updated last year
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 8 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,097Updated this week
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆34Apr 5, 2025Updated 11 months ago
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆18Mar 6, 2026Updated 2 weeks ago
- Official pytorch implementation of "Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use"☆20Sep 16, 2025Updated 6 months ago
- Train your Agent model via our easy and efficient framework☆1,720Dec 5, 2025Updated 3 months ago
- [ASE2024] Mutual Learning-Based Framework for Enhancing Robustness of Code Models via Adversarial Training☆11Sep 13, 2024Updated last year
- Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs (ACM CCS'21)☆17Jan 11, 2023Updated 3 years ago
- Run TRex with PPO☆39May 17, 2025Updated 10 months ago
- ☆21Oct 23, 2025Updated 5 months ago
- [ICLR 2025] Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation☆19Mar 21, 2025Updated last year
- Service for Bert model to Vector. 高效的文本转向量(Text-To-Vector)服务,支持GPU多卡、多worker、多客户端调用,开箱即用。☆12May 24, 2022Updated 3 years ago
- Automatically Update LLM Papers Daily using Github Actions. Ref: https://github.com/Vincentqyw/cv-arxiv-daily☆10Updated this week
- Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs☆22Apr 24, 2025Updated 10 months ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆9,191Mar 16, 2026Updated last week
- the open-source code of QAgent☆56Oct 14, 2025Updated 5 months ago
- Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation☆38Mar 3, 2025Updated last year
- ☆19Sep 3, 2024Updated last year
- ☆425Feb 10, 2025Updated last year
- (best/better) practices of megatron on veRL and tuning guide☆132Sep 26, 2025Updated 5 months ago
- A data processing module implemented with numpy☆10Aug 16, 2022Updated 3 years ago
- ☆16Feb 25, 2026Updated 3 weeks ago
- SFT+RL boosts multimodal reasoning☆47Jun 27, 2025Updated 8 months ago
- [EMNLP 2023 (Findings)] Schema-adaptable Knowledge Graph Construction☆22Jan 28, 2024Updated 2 years ago
- This is the original matlab version of MKCFup☆10Jan 23, 2019Updated 7 years ago
- Code used in our ijcai 2019 paper "Story Ending Prediction by Transferable BERT"☆24Nov 21, 2022Updated 3 years ago
- [ACL 2025] RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios☆25Jul 2, 2025Updated 8 months ago
- Software Engineering, BUAA 课程资源共享平台☆11Apr 24, 2018Updated 7 years ago
- ☆25Dec 29, 2025Updated 2 months ago
- [ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling☆72Nov 23, 2022Updated 3 years ago
- ☆11May 28, 2024Updated last year
- MSTI☆16Mar 6, 2024Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- 使用多头的思想来进行命名实体识别☆34May 5, 2021Updated 4 years ago
- ☆13Feb 17, 2025Updated last year
- Released code for「Stance Detection on Social Media with Background Knowledge」in EMNLP2023.☆19Apr 23, 2024Updated last year
- Code for ACL22 short Paper "Hierarchical Curriculum Learning for AMR Parsing"☆13Jun 1, 2022Updated 3 years ago