PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
☆86Oct 27, 2024Updated last year
Alternatives and similar repositories for Differential-Transformer-PyTorch
Users that are interested in Differential-Transformer-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open source community implementation of the model from "DIFFERENTIAL TRANSFORMER" paper by Microsoft.☆39Mar 22, 2026Updated last week
- ☆13Oct 14, 2024Updated last year
- [KBS'25]Official Implement of "PRADA: Prompt-guided Representation Alignment and Dynamic Adaption for Time Series Forecasting".☆15May 28, 2025Updated 10 months ago
- official repo for `thinking with images through-self-calling`☆25Dec 28, 2025Updated 3 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆19Jul 16, 2024Updated last year
- Pytorch implementation of NeurIPS'25 paper: Improving Time Series Forecasting via Instance-aware Post-hoc Revision☆48Oct 26, 2025Updated 5 months ago
- GoldFinch and other hybrid transformer components☆12Dec 9, 2025Updated 3 months ago
- Exquisite video generation☆14Feb 18, 2024Updated 2 years ago
- A toolkit for researchers in the multimodal sound separation.☆16Oct 20, 2023Updated 2 years ago
- [AAAI 2026] Official repository of Circulant Attention☆40Jan 12, 2026Updated 2 months ago
- lncRNA-Py is a development package for applying machine learning and deep learning to the problem of lncRNA classification, i.e. predicti…☆12Jan 24, 2025Updated last year
- [CVPR 2026] Official Implementation of "Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models".☆16Feb 23, 2026Updated last month
- Official source code of HELM, a family of fully hyperbolic large language models☆33Feb 24, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 为visinger SVS系统写的展示系统~本质仍然是个音乐播放器☆11Apr 18, 2023Updated 2 years ago
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆110Jun 2, 2025Updated 9 months ago
- Official repo of Future-aware Diverse Trends Framework for Recommendation☆11Jul 22, 2022Updated 3 years ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆29Aug 4, 2024Updated last year
- noise reduction☆17Jul 3, 2024Updated last year
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- [CVPR'24] Solving the Catastrophic Forgetting Problem in Generalized Category Discovery https://arxiv.org/pdf/2501.05272☆16Dec 24, 2024Updated last year
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 5 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆16Jul 17, 2025Updated 8 months ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Sep 23, 2024Updated last year
- Train toy models using multi-token prediction objective☆14May 8, 2024Updated last year
- ☆11Dec 11, 2024Updated last year
- ☆24Sep 25, 2024Updated last year
- Implementation for paper Automata Extraction from Transformers.☆12Jun 8, 2024Updated last year
- The code for paper: "DC-Net: Divide-and-Conquer for Salient Object Detection"☆20Aug 30, 2024Updated last year
- ☆13Jan 11, 2026Updated 2 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- This is the our implementation for the paper: Exploring Mixed Information Flow for Cross-domain Sequential Recommendations☆12Aug 17, 2020Updated 5 years ago
- Streaming Audiotransformers for online Audio tagging☆53Jun 14, 2024Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 5 months ago
- [ECCV 2024] DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentat…☆17Feb 26, 2025Updated last year
- Sequence alignement methods with helpers for PyTorch.☆24Nov 30, 2022Updated 3 years ago
- Large-scale Data Classification based on the Integrated Fusion of Fuzzy Learning and Graph Neural Network☆13Nov 2, 2023Updated 2 years ago