在verl上做reward的定制开发
☆178May 2, 2026Updated last month
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TBD☆60Mar 13, 2026Updated 3 months ago
- [AAAI 2025] The official code of the paper "InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct"(http…☆15Jul 10, 2024Updated last year
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 11 months ago
- Official Implementation of "Probing Language Models for Pre-training Data Detection"☆20Dec 4, 2024Updated last year
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆20Mar 6, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Run TRex with PPO☆39May 17, 2025Updated last year
- [ICLR 2025] Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation☆20Mar 21, 2025Updated last year
- Automatically Update LLM Papers Daily using Github Actions. Ref: https://github.com/Vincentqyw/cv-arxiv-daily☆10Jun 8, 2026Updated last week
- [ICML 2026 Spotlight] Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback☆68Jun 3, 2026Updated last week
- The code of TAI'24 paper GLAC-GCN☆10Jun 11, 2024Updated 2 years ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,619Jun 9, 2026Updated last week
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆80Feb 19, 2025Updated last year
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated last year
- ☆432Feb 10, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation☆48Mar 3, 2025Updated last year
- (best/better) practices of megatron on veRL and tuning guide☆136May 12, 2026Updated last month
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆297Jan 17, 2026Updated 4 months ago
- Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025☆17Jan 12, 2026Updated 5 months ago
- SFT+RL boosts multimodal reasoning☆50Jun 27, 2025Updated 11 months ago
- [EMNLP 2023 (Findings)] Schema-adaptable Knowledge Graph Construction☆22Jan 28, 2024Updated 2 years ago
- Software Engineering, BUAA 课程资源共享平台☆11Apr 24, 2018Updated 8 years ago
- [ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling☆74Nov 23, 2022Updated 3 years ago
- ☆11May 28, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- MSTI☆16Mar 6, 2024Updated 2 years ago
- ☆20Sep 23, 2018Updated 7 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- ☆13Feb 17, 2025Updated last year
- Released code for「Stance Detection on Social Media with Background Knowledge」in EMNLP2023.☆20Apr 23, 2024Updated 2 years ago
- Code for ACL22 short Paper "Hierarchical Curriculum Learning for AMR Parsing"☆13Jun 1, 2022Updated 4 years ago
- Few-Shot Relation Extraction with AllenNLP☆12Jan 27, 2019Updated 7 years ago
- An easy way for debug python for Slurm HPC users.☆27Mar 23, 2025Updated last year
- ☆31Aug 18, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆17Feb 20, 2020Updated 6 years ago
- [ICML2025] The official implementation of "C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Gene…☆45May 3, 2025Updated last year
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- A Controllable Model of Grounded Response Generation (AAAI 21)☆13Oct 25, 2022Updated 3 years ago
- ☆50Jan 26, 2026Updated 4 months ago
- This repo contains the syllabus of the Hugging Face Deep Reinforcement Learning Course translated in Chinese.☆10Jan 16, 2024Updated 2 years ago
- Advances of few-shot learning, especially for NLP applications.☆32Jan 8, 2023Updated 3 years ago