declare-lab / EFLALinks
Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
☆69Updated 2 weeks ago
Alternatives and similar repositories for EFLA
Users that are interested in EFLA are comparing it to the libraries listed below
Sorting:
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆134Updated 3 weeks ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆118Updated 2 months ago
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆41Updated 9 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆215Updated 2 months ago
- The official repo of continuous speculative decoding☆31Updated 9 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆53Updated 5 months ago
- implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880☆202Updated this week
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 8 months ago
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆50Updated last month
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆105Updated 7 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆63Updated 10 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆40Updated last year
- Implementation of the proposed MaskBit from Bytedance AI☆83Updated last year
- [NeurIPS 2025] Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".☆94Updated 2 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆56Updated 10 months ago
- Landing repository for the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling"☆41Updated 3 months ago
- Triton implement of bi-directional (non-causal) linear attention☆60Updated 11 months ago
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆54Updated 11 months ago
- Easy and Efficient dLLM Fine-Tuning☆190Updated 3 weeks ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆79Updated last year
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆51Updated 2 weeks ago
- Flash Attention Triton kernel with support for second-order derivatives☆129Updated 2 weeks ago
- ☆62Updated 6 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆96Updated last week
- The official implementation of our paper "CoRe^2: Collect, Reflect and Refine to Generate Better and Faster".☆29Updated 9 months ago
- Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization.☆51Updated last month
- [NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting☆69Updated this week
- ☆263Updated 7 months ago
- Official Code Repository for the paper "Continuous Diffusion Model for Language Modeling" (NeurIPS 2025).☆55Updated 3 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated last year