OpenMOSS / LongLLaDALinks
[AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
☆51Updated last month
Alternatives and similar repositories for LongLLaDA
Users that are interested in LongLLaDA are comparing it to the libraries listed below
Sorting:
- ☆73Updated 7 months ago
- ☆17Updated 5 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆119Updated 2 weeks ago
- Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas☆99Updated 4 months ago
- ☆110Updated 4 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆51Updated 6 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆56Updated 3 weeks ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last month
- ☆55Updated 7 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆40Updated last year
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆26Updated last month
- Esoteric Language Models☆109Updated 2 months ago
- Easy and Efficient dLLM Fine-Tuning☆203Updated last week
- dParallel: Learnable Parallel Decoding for dLLMs☆56Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆42Updated last month
- Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆29Updated 4 months ago
- [ICLR 2026] Geometric-Mean Policy Optimization☆98Updated this week
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated 4 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 8 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆65Updated 10 months ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Updated 10 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Updated last year
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago
- ☆47Updated 3 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Updated last month
- A Sober Look at Language Model Reasoning☆92Updated 2 months ago
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆41Updated 7 months ago