Codebase for Iterative DPO Using Rule-based Rewards
☆270Apr 11, 2025Updated 11 months ago
Alternatives and similar repositories for Online-DPO-R1
Users that are interested in Online-DPO-R1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Recipes to train the self-rewarding reasoning LLMs.☆231Mar 2, 2025Updated last year
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆32Dec 5, 2024Updated last year
- ☆266May 14, 2025Updated 10 months ago
- A recipe for online RLHF and online iterative DPO.☆545Dec 28, 2024Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Sep 22, 2024Updated last year
- Recipes to train reward model for RLHF.☆1,521Apr 24, 2025Updated 10 months ago
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- Align Anything: Training All-modality Model with Feedback☆4,634Nov 27, 2025Updated 3 months ago
- A Speech-to-Text Input Method For Windows☆474Nov 29, 2025Updated 3 months ago
- ☆16Jul 29, 2025Updated 7 months ago
- ☆176Feb 21, 2025Updated last year
- 数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。☆693Mar 12, 2026Updated last week
- Highly encapsulated for effortless usage, this state machine kernel is realized with just a single function call!☆55Aug 21, 2025Updated 7 months ago
- 数字底座是一款面向大型政府、企业数字化转型,基于身份认证、组织架构、岗位职务、应用系统、资源角色、数据目录、安全控制等功能构建的统一且安全的管理支撑平台。数字底座基于三员管理模式,具备微服务、多租户、容器化和国产化,支持用户利用代码生成器快速构建自己的业务应用,同时可关联诸…☆2,579Updated this week
- ☆105Jan 24, 2025Updated last year
- An adaptive sampling framework for Reinforce-style LLM post training.☆92Nov 29, 2025Updated 3 months ago
- 数据标注是一款专门对文本数据进行处理和标注的工具,通过简化快捷的文本标注流程和动态的算法反馈,支持用户快速标注关键词并能通过算法持续减少人工标注的成本和时间。数据标注的过程先由人工标注构建基础,再由自动标注反哺人工标注,最后由人工标注进行纠偏,从而大幅度提高标注的精准度和高…☆696Jun 23, 2025Updated 9 months ago
- Uncommon Objects in 3D dataset☆1,315Nov 13, 2025Updated 4 months ago
- ☆297Sep 14, 2025Updated 6 months ago
- A python package that integrate algorithms and various machine learning approaches to extract features (genes) effective for classificati…☆252Jan 15, 2026Updated 2 months ago
- Simple RL training for reasoning☆3,841Dec 23, 2025Updated 3 months ago
- 工作流引擎对内提供单位/机关流程管理规则和内部业务流程的数字化落地实践;对外提供自动化地第三方业务驱动、接口接入和算法单元驱动能力。工作流引擎在提供底层驱动引擎的同时对全局透明监控、安全防御和国产化特色功能进行充分考虑,是内部流程管理和业务算法驱动的不二之选。☆858Mar 12, 2026Updated last week
- 🔥minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,矿池抽水,矿池中转,矿场运维专用☆3,416Mar 13, 2026Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆170Mar 14, 2025Updated last year
- ☆371Sep 6, 2025Updated 6 months ago
- ☆242Jul 5, 2024Updated last year
- ☆142Nov 13, 2024Updated last year
- AppPlatform 是一个前沿的大模型应用工程,旨在通过集成的声明式编程和低代码配置工具,简化和优化大模型的训练与推理应用的开发过程。本工程为软件工程师和产品经理提供一个强大的、可扩展的环境,以支持从概念到部署的全流程 AI 应用开发。☆1,424Mar 13, 2026Updated last week
- Scalable RL solution for advanced reasoning of language models☆1,821Mar 18, 2025Updated last year
- A curated list of papers, code and resources pertaining to image composition/compositing or object insertion/addition/compositing, which …☆533Feb 24, 2026Updated 3 weeks ago
- Fullstack engineer's checklist for your cybersecurity.☆382Jul 11, 2024Updated last year
- ☆34Oct 31, 2024Updated last year
- A new AI Game Paradigm in Autonomous world. it includes configurations for agents, functional buildings, and equipment, as well as the lo…☆87Jan 12, 2025Updated last year
- TVM Documentation in Chinese Simplified / TVM 中文文档☆3,560Mar 12, 2026Updated last week
- The next generation deep reinforcement learning tookit☆3,462Jun 16, 2023Updated 2 years ago
- modContact - A lightweight embedded serial framework for request<->response protocols. Features master/slave switching, multi-frame handl…☆42Jan 28, 2026Updated last month
- [VLDB'2025] LEAP: LLM-powered End-to-end Automatic Library for Processing Social Science Queries on Unstructured Data☆19Nov 3, 2025Updated 4 months ago
- FIT: 企业级AI开发框架,提供多语言函数引擎(FIT)、流式编排引擎(WaterFlow)及Java生态的LangChain替代方案(FEL)。原生/Spring双模运行,支持插件热插拔与智能聚散部署,无缝统一大模型与业务系统。☆2,105Mar 13, 2026Updated last week
- AML end to end system☆974Dec 7, 2024Updated last year