TsinghuaC3I / Fourier-Position-EmbeddingLinks
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆106Updated 8 months ago
Alternatives and similar repositories for Fourier-Position-Embedding
Users that are interested in Fourier-Position-Embedding are comparing it to the libraries listed below
Sorting:
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Updated last month
- Easy and Efficient dLLM Fine-Tuning☆208Updated 2 weeks ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Updated last year
- ☆119Updated 4 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆105Updated last year
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference☆238Updated 2 weeks ago
- Triton implement of bi-directional (non-causal) linear attention☆64Updated last year
- Code for paper "Patch-Level Training for Large Language Models"☆97Updated 2 months ago
- ☆201Updated 2 years ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆70Updated 3 weeks ago
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆278Updated last month
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆152Updated 2 weeks ago
- Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?☆119Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆104Updated last year
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆65Updated this week
- The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.☆510Updated 2 months ago
- ☆137Updated 2 weeks ago
- [ICLR 2026] TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆419Updated last week
- LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.☆236Updated last month
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆124Updated last year
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆361Updated 8 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆237Updated 3 months ago
- ☆64Updated 6 months ago
- ☆104Updated 11 months ago
- [NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models☆129Updated 8 months ago
- [ICML 2025 Spotlight] Direct Discriminative Optimization: Reinforcing Diffusion/Autoregressive with GAN Discrimination☆110Updated last week
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆30Updated last year
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆68Updated 4 months ago
- The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is unde…☆77Updated 5 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆83Updated last year