[ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
☆41May 20, 2025Updated 10 months ago
Alternatives and similar repositories for AR-Lopti
Users that are interested in AR-Lopti are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2024] DMBP: Diffusion Model-Based Predictor for Robust Offline Reinforcement Learning against State Observations Perturbations.☆17May 24, 2024Updated last year
- [ICLR 2025] SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training☆40Apr 4, 2025Updated 11 months ago
- ☆77Jun 28, 2025Updated 8 months ago
- [ICLR 2025 Spotlight] Official PyTorch Implementation of "What Makes a Good Diffusion Planner for Decision Making?"☆80Apr 20, 2025Updated 11 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆426Mar 20, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆21Mar 18, 2026Updated last week
- Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation☆20Jun 11, 2025Updated 9 months ago
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆32Jan 7, 2026Updated 2 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Oct 11, 2023Updated 2 years ago
- The official implementation of "Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization"☆16Mar 14, 2024Updated 2 years ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆53Jul 15, 2025Updated 8 months ago
- The official implementation of NeurIPS2024 paper "SubgDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning."☆11May 28, 2025Updated 9 months ago
- The official implementation of InfoRM [NeurIPS 2024].☆15Oct 25, 2025Updated 5 months ago
- Implementation codes for NeurIPS23 paper "Spectral Invariant Learning for Dynamic Graphs under Distribution Shifts"☆14Mar 19, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated 3 months ago
- DNA-D2S: a systematic error simulation Model for DNA Data Storage channel☆12Feb 14, 2022Updated 4 years ago
- ☆14Jun 24, 2024Updated last year
- This repository is contains several Automated feature selection methods in CTR Predicition.☆10Dec 18, 2022Updated 3 years ago
- A Datasette instance for searching WebVid-10M☆15Sep 30, 2022Updated 3 years ago
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 4 months ago
- ☆10Jun 12, 2023Updated 2 years ago
- Introduction about AWESOME_ENTROPY+LRM_PAPERS☆30Dec 16, 2025Updated 3 months ago
- Code for the paper "Spectrum Guided Topology Augmentation for Graph Contrastive Learning"☆11Jul 18, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆426Jul 11, 2025Updated 8 months ago
- Reinforcement Learning via Self-Distillation (SDPO)☆689Feb 18, 2026Updated last month
- ☆12Sep 8, 2020Updated 5 years ago
- Low-rank adaptation of large language models (LoRA) for Segment Anything 2.☆18Oct 31, 2024Updated last year
- ☆13Mar 8, 2024Updated 2 years ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- [EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information☆13Oct 1, 2024Updated last year
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆27Jun 30, 2025Updated 8 months ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆19May 14, 2025Updated 10 months ago
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆50Oct 11, 2024Updated last year
- ☆11Oct 2, 2023Updated 2 years ago
- ☆12Oct 29, 2023Updated 2 years ago
- This repository is associated with the research paper titled ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large…☆15Jun 4, 2025Updated 9 months ago
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆59Aug 24, 2025Updated 7 months ago
- This repository organizes the Imagnet1k dataset into 10 coarse classes, where each class consists of semantically similar image categorie…☆22Dec 11, 2023Updated 2 years ago