PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
☆86Oct 27, 2024Updated last year
Alternatives and similar repositories for Differential-Transformer-PyTorch
Users that are interested in Differential-Transformer-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [KBS'25]Official Implement of "PRADA: Prompt-guided Representation Alignment and Dynamic Adaption for Time Series Forecasting".☆14May 28, 2025Updated last year
- official repo for `thinking with images through-self-calling`☆25Dec 28, 2025Updated 5 months ago
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆19Jul 16, 2024Updated last year
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆27Feb 10, 2026Updated 4 months ago
- Pytorch implementation of NeurIPS'25 paper: Improving Time Series Forecasting via Instance-aware Post-hoc Revision☆50Oct 26, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Exquisite video generation☆15Feb 18, 2024Updated 2 years ago
- GoldFinch and other hybrid transformer components☆13Dec 9, 2025Updated 6 months ago
- [ICMR 2025] Official Repository for The Paper, Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale …☆18Aug 17, 2025Updated 10 months ago
- [CVPR 2026] Official Implementation of "Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models".☆22Jun 1, 2026Updated 2 weeks ago
- Official source code of HELM, a family of fully hyperbolic large language models☆37Apr 4, 2026Updated 2 months ago
- 为visinger SVS系统写的展示系统~本质仍然是个音乐播放器☆11Apr 18, 2023Updated 3 years ago
- A PyTorch implementation of Determinantal Point Process Likelihoods for Sequential Recommendation☆12Dec 9, 2024Updated last year
- Official repo of Future-aware Diverse Trends Framework for Recommendation☆11Jul 22, 2022Updated 3 years ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆31Aug 4, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆117Jun 2, 2025Updated last year
- noise reduction☆17Jul 3, 2024Updated last year
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- [CVPR'24] Solving the Catastrophic Forgetting Problem in Generalized Category Discovery https://arxiv.org/pdf/2501.05272☆16Dec 24, 2024Updated last year
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated 2 years ago
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆24Oct 13, 2025Updated 8 months ago
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- ☆16Jan 12, 2023Updated 3 years ago
- convert a saved pytorch model to gguf and generate as much corresponding ggml c code as possible☆15Dec 19, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Histopathology Feature Extractors (2024)☆14Jun 14, 2024Updated 2 years ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Sep 23, 2024Updated last year
- Train toy models using multi-token prediction objective☆14Apr 18, 2026Updated 2 months ago
- ☆24Sep 25, 2024Updated last year
- This is the official implementation to the EMNLP 2024 paper: Modeling Layout Reading Order as Ordering Relations for Visually-rich Docume…☆32Jan 19, 2026Updated 5 months ago
- ☆13Apr 26, 2026Updated last month
- The code for paper: "DC-Net: Divide-and-Conquer for Salient Object Detection"☆21Aug 30, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆57Aug 20, 2024Updated last year
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An Java based open source electrocardiograph(ECG) analysis software, imported from Google Code.☆20Feb 10, 2015Updated 11 years ago
- Group project "Algorithms for large-scale optimal transport". Implement ADMMs and Sinkhorn's Algorithms.☆11Jan 28, 2019Updated 7 years ago
- Streaming Audiotransformers for online Audio tagging☆56Jun 14, 2024Updated 2 years ago
- [ECCV 2024] DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentat…☆17Feb 26, 2025Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆137May 5, 2026Updated last month
- Sequence alignement methods with helpers for PyTorch.☆24Nov 30, 2022Updated 3 years ago
- Experimental GPU language with meta-programming☆31Sep 6, 2024Updated last year