在verl上做reward的定制开发
☆151May 22, 2025Updated 11 months ago
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TBD☆56Mar 13, 2026Updated last month
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 10 months ago
- [ACL2026 Findings] GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning☆81Jun 23, 2025Updated 10 months ago
- "Topological Identification and Interpretation for Single-cell Gene Regulation Elucidation across Multiple Platforms using scMGCA" in Nat…☆18Feb 5, 2023Updated 3 years ago
- [ICLR'26] RM-R1: Unleashing the Reasoning Potential of Reward Models☆163Jun 26, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR 2025] Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation☆20Mar 21, 2025Updated last year
- Service for Bert model to Vector. 高效的文本转向量(Text-To-Vector)服务,支持GPU多卡、多worker、多客户端调用,开箱即用。☆13May 24, 2022Updated 3 years ago
- Automatically Update LLM Papers Daily using Github Actions. Ref: https://github.com/Vincentqyw/cv-arxiv-daily☆10Updated this week
- Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs☆22Apr 24, 2025Updated last year
- The code of TAI'24 paper GLAC-GCN☆10Jun 11, 2024Updated last year
- ☆19Sep 3, 2024Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"☆130Mar 28, 2025Updated last year
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated last year
- ☆430Feb 10, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation☆41Mar 3, 2025Updated last year
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆279Jan 17, 2026Updated 3 months ago
- (best/better) practices of megatron on veRL and tuning guide☆132Updated this week
- ☆19Feb 25, 2026Updated 2 months ago
- Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,☆18Nov 22, 2024Updated last year
- Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025☆17Jan 12, 2026Updated 3 months ago
- [EMNLP 2023 (Findings)] Schema-adaptable Knowledge Graph Construction☆22Jan 28, 2024Updated 2 years ago
- Code used in our ijcai 2019 paper "Story Ending Prediction by Transferable BERT"☆24Nov 21, 2022Updated 3 years ago
- Software Engineering, BUAA 课程资源共享平台☆11Apr 24, 2018Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- MSTI☆16Mar 6, 2024Updated 2 years ago
- ☆20Sep 23, 2018Updated 7 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- ☆13Feb 17, 2025Updated last year
- Released code for「Stance Detection on Social Media with Background Knowledge」in EMNLP2023.☆20Apr 23, 2024Updated 2 years ago
- Code for ACL22 short Paper "Hierarchical Curriculum Learning for AMR Parsing"☆13Jun 1, 2022Updated 3 years ago
- Few-Shot Relation Extraction with AllenNLP☆12Jan 27, 2019Updated 7 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- A Controllable Model of Grounded Response Generation (AAAI 21)☆13Oct 25, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Phase-aware Adversarial Defense for Improving Adversarial Robustness☆11Oct 12, 2023Updated 2 years ago
- Advances of few-shot learning, especially for NLP applications.☆32Jan 8, 2023Updated 3 years ago
- Official Repository of "Learning what reinforcement learning can't"☆84Dec 30, 2025Updated 4 months ago
- [WWW 2021] Target-adaptive Graph for Cross-target Stance Detection☆16Dec 15, 2021Updated 4 years ago
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"☆22May 6, 2025Updated 11 months ago
- Framework for testing models with AI2 leaderboards☆21Nov 8, 2023Updated 2 years ago
- A comprehensive survey of the World Law Agent ecosystem — AI + Law☆29Mar 14, 2026Updated last month