Codebase for Iterative DPO Using Rule-based Rewards
☆273Apr 11, 2025Updated last year
Alternatives and similar repositories for Online-DPO-R1
Users that are interested in Online-DPO-R1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Recipes to train the self-rewarding reasoning LLMs.☆232Mar 2, 2025Updated last year
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆32Dec 5, 2024Updated last year
- ☆273May 14, 2025Updated last year
- A recipe for online RLHF and online iterative DPO.☆544Dec 28, 2024Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆42Sep 22, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Directional Preference Alignment☆62Sep 23, 2024Updated last year
- Recipes to train reward model for RLHF.☆1,534Apr 24, 2025Updated last year
- Align Anything: Training All-modality Model with Feedback☆4,662Nov 27, 2025Updated 7 months ago
- A Speech-to-Text Input Method For Windows☆473Nov 29, 2025Updated 7 months ago
- ☆16Jul 29, 2025Updated 11 months ago
- ☆175Feb 21, 2025Updated last year
- 数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享 、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数据任务、数据高频上报、数据高频采集、异构数据兼容的实际数据问题。☆694May 14, 2026Updated last month
- Highly encapsulated for effortless usage, this state machine kernel is realized with just a single function call!☆58Aug 21, 2025Updated 10 months ago
- 数字底座是一款面向大型政府、企业数字化转型,基于身份认证、组织架构、岗位职务、应用系统、资源角色、数据目录、安全控制等功能构建的统一且安全的管理支撑平台。数字底座基于三员管理模式,具备微服务、多租户、容器化和国产化,支持用户利用代码生成器快速构建自己的业务应用,同时可关联诸…☆2,597Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆105Jan 24, 2025Updated last year
- Uncommon Objects in 3D dataset☆1,337Nov 13, 2025Updated 7 months ago
- ☆296Sep 14, 2025Updated 9 months ago
- 数据标注是一款专门对文本数据进行处理和标注的工具,通过简化快捷的文本标注流程和动态的算法反馈,支持用户快速标注关键词并能通过算法持续减少人工标注的成本和时间。数据标注的过程先由人工标注构建基础,再由自动标注反哺人工标注,最后由人工标注进行纠偏,从而大幅度提高标注的精准度和高…☆695Jun 23, 2025Updated last year
- A python package that integrate algorithms and various machine learning approaches to extract features (genes) effective for classificati…☆251Jan 15, 2026Updated 5 months ago
- 工作流引擎对内提供单位/机关流程管理规则和内部业务流程的数字化落地实践;对外提供自动化地第三方业务驱动、接口接入和算法单元驱动能力。工作流引擎在提供底层驱动引擎的同时对全局透明监控、安全防御和国产化特色功能进行充分考虑,是内部流程管理和业务算法驱动的不二之选。☆861Updated this week
- Simple RL training for reasoning☆3,871Dec 23, 2025Updated 6 months ago
- An adaptive sampling framework for Reinforce-style LLM post training.☆96Nov 29, 2025Updated 7 months ago
- 🔥minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,minerproxy,矿池抽水,矿池中转,矿场运维专用☆3,707May 22, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Repo of paper "Free Process Rewards without Process Labels"☆171Mar 14, 2025Updated last year
- ☆370Apr 1, 2026Updated 3 months ago
- ☆242Jun 16, 2026Updated 2 weeks ago
- ☆142Nov 13, 2024Updated last year
- AppPlatform 是一个前沿的大模型应用工程,旨在通过集成的声明式编程和低代码配置工具,简化和优化大模型的训练与推理应用的开发过程。本工程为软件工程师和产品经理提供一个强大的、可扩展的环境,以支持从概念到部署的全流程 AI 应用开发。☆1,439May 18, 2026Updated last month
- A curated list of papers, code and resources pertaining to image composition/compositing or object/subject insertion/addition/compositing…☆536Apr 30, 2026Updated 2 months ago
- Scalable RL solution for advanced reasoning of language models☆1,864Mar 18, 2025Updated last year
- Fullstack engineer's checklist for your cybersecurity.☆382Jul 11, 2024Updated last year
- A new AI Game Paradigm in Autonomous world. it includes configurations for agents, functional buildings, and equipment, as well as the lo…☆89Jan 12, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆34Oct 31, 2024Updated last year
- modContact - A lightweight embedded serial framework for request<->response protocols. Features master/slave switching, multi-frame handl…☆43Jan 28, 2026Updated 5 months ago
- The next generation deep reinforcement learning tookit☆3,463Jun 16, 2023Updated 3 years ago
- [VLDB'2025] LEAP: LLM-powered End-to-end Automatic Library for Processing Social Science Queries on Unstructured Data☆20Nov 3, 2025Updated 8 months ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆3,818May 20, 2026Updated last month
- FIT: 企业级AI开发框架,提供多语言函数引擎(FIT)、流式编排引擎(WaterFlow)及Java生态的LangChain替代方案(FEL)。原生/Spring双模运行,支持插件热插拔与智能聚散部署,无缝统一大模型与业务系统。☆2,109Mar 13, 2026Updated 3 months ago
- Efficient DiT architecture for text2any tasks, ICLR2025☆446May 10, 2025Updated last year