PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
☆86Oct 27, 2024Updated last year
Alternatives and similar repositories for Differential-Transformer-PyTorch
Users that are interested in Differential-Transformer-PyTorch are comparing it to the libraries listed below
Sorting:
- An open source community implementation of the model from "DIFFERENTIAL TRANSFORMER" paper by Microsoft.☆38Feb 9, 2026Updated last month
- ☆13Oct 14, 2024Updated last year
- [CVPR 2026] Official Implementation of "Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models".☆15Feb 23, 2026Updated 2 weeks ago
- GoldFinch and other hybrid transformer components☆12Dec 9, 2025Updated 3 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- ☆13Jan 11, 2026Updated last month
- A collection of real-time audio effect algorithms implemented in C++.☆19Jul 16, 2025Updated 7 months ago
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- [AAAI 2026] Official repository of Circulant Attention☆28Jan 12, 2026Updated last month
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆18Jul 16, 2024Updated last year
- convert a saved pytorch model to gguf and generate as much corresponding ggml c code as possible☆15Dec 19, 2023Updated 2 years ago
- Personal website☆16Feb 20, 2026Updated 2 weeks ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆29Aug 4, 2024Updated last year
- A toolkit for researchers in the multimodal sound separation.☆16Oct 20, 2023Updated 2 years ago
- ☆22Apr 2, 2024Updated last year
- [CVPR 2025] Official PyTorch implementation of MaskSub "Masking meets Supervision: A Strong Learning Alliance"☆45Mar 25, 2025Updated 11 months ago
- Spatial Spectral Machine Learning☆14Oct 15, 2025Updated 4 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆41Aug 29, 2024Updated last year
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 4 months ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆29Updated this week
- Sequence alignement methods with helpers for PyTorch.☆24Nov 30, 2022Updated 3 years ago
- ☆20Jul 12, 2023Updated 2 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 7 months ago
- Pytorch implementation of NeurIPS'25 paper: Improving Time Series Forecasting via Instance-aware Post-hoc Revision☆48Oct 26, 2025Updated 4 months ago
- ☆46Oct 11, 2023Updated 2 years ago
- "Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics" by Wuyang Chen, Xinyu Gong, Yu…☆27Aug 20, 2023Updated 2 years ago
- ☆23Oct 17, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆110Jun 2, 2025Updated 9 months ago
- ☆23Oct 15, 2024Updated last year
- ☆24Sep 25, 2024Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 3 months ago
- C++ version of pyannote audio speaker diarizaiton pipeline☆22Feb 14, 2024Updated 2 years ago
- Fast and differentiable time domain all-pole filter in PyTorch.☆68Feb 5, 2026Updated last month
- Codebase for ICLR' 23 paper- ''wav2tok: Deep Sequence Tokenizer for Audio Retrieval"☆36Feb 10, 2026Updated 3 weeks ago
- ☆32Jan 7, 2024Updated 2 years ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- Transcribing Speech with Multinomial Diffusion, training code and models.☆80Sep 27, 2023Updated 2 years ago
- ☆14Jun 24, 2024Updated last year