OpenMOSS / LongLLaDALinks
[AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
☆51Updated 2 months ago
Alternatives and similar repositories for LongLLaDA
Users that are interested in LongLLaDA are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆119Updated 3 weeks ago
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆26Updated last month
- Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas☆99Updated this week
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆41Updated 7 months ago
- Official implementation of paper "ACON: Optimizing Context Compression for Long-horizon LLM Agents"☆55Updated 3 months ago
- ☆75Updated 7 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆51Updated 6 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆68Updated 5 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last week
- ☆17Updated 6 months ago
- Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆29Updated 4 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Updated last year
- ☆110Updated 4 months ago
- [ICLR 2026] dParallel: Learnable Parallel Decoding for dLLMs☆58Updated last week
- Defeating the Training-Inference Mismatch via FP16☆181Updated 2 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Updated 4 months ago
- ☆47Updated 4 months ago
- ☆23Updated last year
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆57Updated last month
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆104Updated last week
- Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".☆111Updated 2 weeks ago
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated 5 months ago
- ☆55Updated 8 months ago
- [ICLR 2026] Geometric-Mean Policy Optimization☆99Updated last week
- ☆64Updated 7 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆65Updated last week
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Updated last month
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆59Updated last year