On Policy Distillation Build on top of Verl
☆87May 25, 2026Updated last month
Alternatives and similar repositories for OPSD_OnPolicyDistillation
Users that are interested in OPSD_OnPolicyDistillation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repo for "TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders"☆25Apr 9, 2026Updated 2 months ago
- ☆17Apr 11, 2025Updated last year
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆32Oct 9, 2025Updated 8 months ago
- [ACL 2026]From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning☆39Apr 26, 2026Updated 2 months ago
- Py Minutiae Viewer is a cross-platform, multi-format minutiae viewer.☆12Dec 5, 2019Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆65Feb 4, 2026Updated 4 months ago
- Python numerical optimization toolbox☆12Nov 20, 2018Updated 7 years ago
- 主要是在高博的《视觉SLAM十四讲》提供的实践代码基础上,加入一些自己平时会用到的代码。☆17Mar 11, 2021Updated 5 years ago
- [ICML2026] ARLArena☆82May 2, 2026Updated last month
- ☆32Aug 11, 2025Updated 10 months ago
- ☆30Jul 22, 2024Updated last year
- TBD☆62Mar 13, 2026Updated 3 months ago
- Vero: An Open RL Recipe for General Visual Reasoning☆129Jun 19, 2026Updated last week
- Implementation of the pictorial structures algorithm☆14Jul 23, 2014Updated 11 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official Implementation of "Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning" in AAAI2024.☆13Feb 28, 2024Updated 2 years ago
- FakePartsBench: 25K+ AI-generated videos with pixel- and frame-level annotations of full and partial deepfakes.☆25May 29, 2026Updated 3 weeks ago
- ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative AP…☆14Jun 27, 2025Updated last year
- [AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615☆67Nov 8, 2025Updated 7 months ago
- PyTorch Implementation for InMaP☆12Oct 28, 2023Updated 2 years ago
- ☆17Jun 10, 2025Updated last year
- [Arxiv 2025] Official code and datasets of paper: GNNs as Predictors of Agentic Workflow Performances☆20Jan 15, 2026Updated 5 months ago
- RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios☆557Jun 12, 2026Updated 2 weeks ago
- Code for "Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion", SIGIR 2024.☆14Feb 20, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [NeurIPS 2024 Oral] "Bayesian-Guided Label Mapping for Visual Reprogramming"☆12Dec 20, 2024Updated last year
- Code of "Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model"☆13Jul 8, 2025Updated 11 months ago
- collab-dev - Collaboration Metrics for Code Reviews☆23May 12, 2025Updated last year
- Official repository for the paper "Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning" and the SciEvo benchmark.☆44Jan 13, 2026Updated 5 months ago
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆54Oct 9, 2025Updated 8 months ago
- [AAAI 2023] Pytorch Implementation for AAAI2023 paper: One-for-All: Proposal Masked Cross-Class Anomaly Detection☆15Oct 31, 2024Updated last year
- [NeurIPS 2025] Let LRMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604☆54Nov 4, 2025Updated 7 months ago
- [NeurIPS 2025] Mind the Gap: Bridging Thought Leap for Improved CoT Tuning https://arxiv.org/abs/2505.14684☆48Oct 20, 2025Updated 8 months ago
- Implementation of related angular-margin-based classification loss functions for training (face) embedding models: SphereFace, CosFace, A…☆26May 21, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"☆246May 28, 2026Updated last month
- Custom ComfyUI node that combines VSR + VFI and allows streaming processing for arbitrary video length.☆66Mar 28, 2026Updated 3 months ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆189May 1, 2026Updated last month
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆131May 24, 2026Updated last month
- 在 Mirai Console 中使用MCL管理包和其他高级功能☆10Nov 13, 2022Updated 3 years ago
- [arXiv 2026] Official PyTorch Repository for "Coarse-Guided Visual Generation via Weighted h-Transform Sampling"☆42May 8, 2026Updated last month
- ☆15Nov 20, 2023Updated 2 years ago