czg1225 / dParallelLinks
dParallel: Learnable Parallel Decoding for dLLMs
☆36Updated last week
Alternatives and similar repositories for dParallel
Users that are interested in dParallel are comparing it to the libraries listed below
Sorting:
- ☆98Updated last month
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated last month
- ☆62Updated 3 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last month
- [NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models☆110Updated 5 months ago
- The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…☆95Updated last month
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆128Updated 3 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆86Updated 8 months ago
- ☆19Updated 9 months ago
- ☆17Updated 2 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆88Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆26Updated 3 months ago
- ☆14Updated 11 months ago
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆58Updated 3 weeks ago
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆94Updated 11 months ago
- Code for Heima☆56Updated 6 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆42Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated last year
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆52Updated last week
- ☆60Updated last week
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆54Updated 4 months ago
- [EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆65Updated 6 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆21Updated last year
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆47Updated 3 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆47Updated 3 weeks ago
- Official PyTorch implementation of the paper "Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Princ…☆33Updated 3 months ago
- ☆10Updated last year
- Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆23Updated 4 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆213Updated 9 months ago
- Codes for Merging Large Language Models☆33Updated last year