yuanzhoulvpi2017 / nano_rlView external linksLinks
在verl上做reward的定制开发
☆144May 22, 2025Updated 8 months ago
Alternatives and similar repositories for nano_rl
Users that are interested in nano_rl are comparing it to the libraries listed below
Sorting:
- TBD☆39Feb 3, 2026Updated 2 weeks ago
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 7 months ago
- [AAAI 2025] The official code of the paper "InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct"(http…☆14Jul 10, 2024Updated last year
- [EMNLP 2023 (Findings)] Schema-adaptable Knowledge Graph Construction☆22Jan 28, 2024Updated 2 years ago
- ☆18Apr 20, 2025Updated 9 months ago
- an method to make vlm think like r1☆21May 28, 2025Updated 8 months ago
- Official Implementation of "Simulating Environments with Reasoning Models for Agent Training"☆57Updated this week
- Framework for testing models with AI2 leaderboards☆21Nov 8, 2023Updated 2 years ago
- ☆20Sep 23, 2018Updated 7 years ago
- milvus tutorials☆20May 9, 2022Updated 3 years ago
- Code used in our ijcai 2019 paper "Story Ending Prediction by Transferable BERT"☆24Nov 21, 2022Updated 3 years ago
- SFT+RL boosts multimodal reasoning☆46Jun 27, 2025Updated 7 months ago
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590☆80Jul 31, 2025Updated 6 months ago
- AIR retriever for Multi-Hop QA (ACL 2020 paper)☆30Jul 18, 2020Updated 5 years ago
- Natural Language Generation by Hierarchical Decoding with Linguistic Patterns (NAACL-HLT 2018), Investigating Linguistic Pattern Ordering…☆32Sep 23, 2018Updated 7 years ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆159Jun 26, 2025Updated 7 months ago
- Spark—Python学习笔记☆11Sep 25, 2018Updated 7 years ago
- [ASE2024] Mutual Learning-Based Framework for Enhancing Robustness of Code Models via Adversarial Training☆11Sep 13, 2024Updated last year
- A set of tools that make working with the Scala ecosystem even better.☆12Feb 10, 2026Updated last week
- An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…☆11Jan 23, 2026Updated 3 weeks ago
- Advances of few-shot learning, especially for NLP applications.☆32Jan 8, 2023Updated 3 years ago
- [ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling☆72Nov 23, 2022Updated 3 years ago
- 大型语言模型实战指南: 应用实践与场景落地☆88Sep 13, 2024Updated last year
- 使用多头的思想来进行命名实体识别☆34May 5, 2021Updated 4 years ago
- Java library to fulfil the requirement of numpy in java☆22Oct 23, 2024Updated last year
- 是APEX贡献的一个基于大数据平台能力的数据开发平台,帮助企业以最小成本实现链接数据,构建和沉淀数仓模型,降低数据应用门槛, 沉淀数据价值。☆12Oct 31, 2024Updated last year
- Automatic defect recognition in X-ray testing using computer vision☆12Dec 8, 2018Updated 7 years ago
- Simplifies data migration between Apache Ignite clusters by relying on Apache Avro as an intermediate storage format☆13Jun 27, 2023Updated 2 years ago
- ☆13Feb 17, 2025Updated last year
- AQIPython is a Python module that calculates the Air Quality Index (AQI) for various air pollutants based on different standards.☆10Mar 5, 2024Updated last year
- breast Cancer乳腺癌数据挖掘,python sklearn☆11Apr 13, 2019Updated 6 years ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆95Nov 13, 2025Updated 3 months ago
- Light local website for displaying performances from different chat models.☆87Nov 13, 2023Updated 2 years ago
- ☆15Feb 10, 2025Updated last year
- [ICML2025] The official implementation of "C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Gene…☆42May 3, 2025Updated 9 months ago
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated 10 months ago
- A python wrapper for the QuantAQ RESTful API☆11Dec 24, 2025Updated last month
- Add a cute LuoTianyi to your cnblogs! (live2d-v3-model)☆11Apr 23, 2024Updated last year
- ☆14Jun 15, 2023Updated 2 years ago