在verl上做reward的定制开发
☆148May 22, 2025Updated 10 months ago
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TBD☆53Mar 13, 2026Updated last month
- [AAAI 2025] The official code of the paper "InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct"(http…☆14Jul 10, 2024Updated last year
- ☆15Nov 22, 2023Updated 2 years ago
- ☆81Jun 23, 2025Updated 9 months ago
- Official Implementation of "Probing Language Models for Pre-training Data Detection"☆20Dec 4, 2024Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- [AAAI 2026] Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension☆18Mar 6, 2026Updated last month
- Train your Agent model via our easy and efficient framework☆1,727Dec 5, 2025Updated 4 months ago
- "Topological Identification and Interpretation for Single-cell Gene Regulation Elucidation across Multiple Platforms using scMGCA" in Nat…☆17Feb 5, 2023Updated 3 years ago
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆212Jan 9, 2025Updated last year
- Service for Bert model to Vector. 高效的文本转向量(Text-To-Vector)服务,支持GPU多卡、多worker、多客户端调用,开箱即用。☆13May 24, 2022Updated 3 years ago
- Automatically Update LLM Papers Daily using Github Actions. Ref: https://github.com/Vincentqyw/cv-arxiv-daily☆10Updated this week
- ☆63Mar 8, 2026Updated last month
- Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs☆22Apr 24, 2025Updated 11 months ago
- Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"☆88Mar 18, 2026Updated 3 weeks ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆9,315Updated this week
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆79Feb 19, 2025Updated last year
- ☆19Sep 3, 2024Updated last year
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated last year
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆266Jan 17, 2026Updated 2 months ago
- ☆427Feb 10, 2025Updated last year
- (best/better) practices of megatron on veRL and tuning guide☆132Sep 26, 2025Updated 6 months ago
- Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,☆18Nov 22, 2024Updated last year
- Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025☆17Jan 12, 2026Updated 3 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- SFT+RL boosts multimodal reasoning☆48Jun 27, 2025Updated 9 months ago
- This is the original matlab version of MKCFup☆10Jan 23, 2019Updated 7 years ago
- Code used in our ijcai 2019 paper "Story Ending Prediction by Transferable BERT"☆24Nov 21, 2022Updated 3 years ago
- [ACL 2025] RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios☆26Jul 2, 2025Updated 9 months ago
- Software Engineering, BUAA 课程资源共享平台☆11Apr 24, 2018Updated 7 years ago
- [ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling☆72Nov 23, 2022Updated 3 years ago
- ☆20Sep 23, 2018Updated 7 years ago
- MSTI☆16Mar 6, 2024Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- 使用多头的思想来进行命名实体识别☆34May 5, 2021Updated 4 years ago
- Released code for「Stance Detection on Social Media with Background Knowledge」in EMNLP2023.☆20Apr 23, 2024Updated last year
- Code for ACL22 short Paper "Hierarchical Curriculum Learning for AMR Parsing"☆13Jun 1, 2022Updated 3 years ago
- Few-Shot Relation Extraction with AllenNLP☆12Jan 27, 2019Updated 7 years ago
- An easy way for debug python for Slurm HPC users.☆28Mar 23, 2025Updated last year
- ☆31Aug 18, 2025Updated 7 months ago
- This repo contains implementation of deep learning-based steel surface defect segmentation models. Extensive experiments on several deep …☆22May 19, 2025Updated 10 months ago