Codebase for Iterative DPO Using Rule-based Rewards
☆269Apr 11, 2025Updated 10 months ago
Alternatives and similar repositories for Online-DPO-R1
Users that are interested in Online-DPO-R1 are comparing it to the libraries listed below
Sorting:
- Recipes to train the self-rewarding reasoning LLMs.☆231Mar 2, 2025Updated last year
- ☆264May 14, 2025Updated 9 months ago
- A recipe for online RLHF and online iterative DPO.☆540Dec 28, 2024Updated last year
- Recipes to train reward model for RLHF.☆1,517Apr 24, 2025Updated 10 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆32Dec 5, 2024Updated last year
- Highly encapsulated for effortless usage, this state machine kernel is realized with just a single function call!☆55Aug 21, 2025Updated 6 months ago
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- Align Anything: Training All-modality Model with Feedback☆4,636Nov 27, 2025Updated 3 months ago
- ☆176Feb 21, 2025Updated last year
- A Speech-to-Text Input Method For Windows☆474Nov 29, 2025Updated 3 months ago
- ☆104Jan 24, 2025Updated last year
- 数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。☆692Updated this week
- modContact - A lightweight embedded serial framework for request<->response protocols. Features master/slave switching, multi-frame handl…☆42Jan 28, 2026Updated last month
- 数据标注是一款专门对文本数据进行处理和标注的工具,通过简化快捷的文本标注流程和动态的算法反馈,支持用户快速标注关键词并能通过算法持续减少人工标注的成本和时间。数据标注的过程先由人工标注构建基础,再由自动标注反哺人工标注,最后由人工标注进行纠偏,从而大幅度提高标注的精准度和高…☆695Jun 23, 2025Updated 8 months ago
- ☆297Sep 14, 2025Updated 5 months ago
- ☆142Nov 13, 2024Updated last year
- A python package that integrate algorithms and various machine learning approaches to extract features (genes) effective for classificati…☆252Jan 15, 2026Updated last month
- ☆10Dec 25, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆169Mar 14, 2025Updated 11 months ago
- ☆242Jul 5, 2024Updated last year
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆193Mar 20, 2025Updated 11 months ago
- 数字底座是一款面向大型政府、企业数字化转型,基于身份认证、组织架构、岗位职务、应用系统、资源角色、数据目录、安全控制等功能构建的统一且安全的管理支撑平台。数字底座基于三员管理模式,具备微服务、多租户、容器化和国产化,支持用户利用代码生成器快速构建自己的业务应用,同时可关联诸…☆2,574Updated this week
- Uncommon Objects in 3D dataset☆1,312Nov 13, 2025Updated 3 months ago
- ☆371Sep 6, 2025Updated 5 months ago
- 工作流引擎对内提供单位/机关流程管理规则和内部业务流程的数字化落地实践;对外提供自动化地第三方业务驱动、接口接入和算法单元驱动能力。工作流引擎在提供底层驱动引擎的同时对全局透明监控、安全防御和国产化特色功能进行充分考虑,是内部流程管理和业务算法驱动的不二之选。☆857Updated this week
- A new AI Game Paradigm in Autonomous world. it includes configurations for agents, functional buildings, and equipment, as well as the lo…☆87Jan 12, 2025Updated last year
- Fullstack engineer's checklist for your cybersecurity.☆381Jul 11, 2024Updated last year
- A curated list of papers, code and resources pertaining to image composition/compositing or object insertion/addition/compositing, which …☆533Updated this week
- Simple RL training for reasoning☆3,830Dec 23, 2025Updated 2 months ago
- ☆33Oct 31, 2024Updated last year
- Reproduce R1 Zero on Logic Puzzle☆2,439Mar 20, 2025Updated 11 months ago
- AppPlatform 是一个前沿的大模型应用工程,旨在通过集成的声明式编程和低代码配置工具,简化和优化大模型的训练与推理应用的开发过程。本工程为软件工程师和产品经理提供一个强大的、可扩展的环境,以支持从概念到部署的全流程 AI 应用开发。☆1,422Updated this week
- Scalable RL solution for advanced reasoning of language models☆1,809Mar 18, 2025Updated 11 months ago
- ☆249Jul 19, 2023Updated 2 years ago
- Some tools for cloud developers☆407Aug 30, 2024Updated last year
- GlucoInsight:Framework for Glucose Management Application☆84Aug 6, 2024Updated last year
- ☆75Feb 17, 2025Updated last year
- [CVPR 2025] Official implementation for "Empowering LLMs to Understand and Generate Complex Vector Graphics" https://arxiv.org/abs/2412.1…☆615May 22, 2025Updated 9 months ago
- https://zourunfa.github.io/guitar-elf/ The idea is to create a guitar tool, including guitar tuning, chord calculation, etc☆19Jun 26, 2025Updated 8 months ago