PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
☆86Oct 27, 2024Updated last year
Alternatives and similar repositories for Differential-Transformer-PyTorch
Users that are interested in Differential-Transformer-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2025] Official PyTorch implementation of MaskSub "Masking meets Supervision: A Strong Learning Alliance"☆46Mar 25, 2025Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆19Jul 16, 2024Updated last year
- Pytorch implementation of NeurIPS'25 paper: Improving Time Series Forecasting via Instance-aware Post-hoc Revision☆49Oct 26, 2025Updated 5 months ago
- [ACCV 2024 (Oral, Best Application Paper)] Official Implementation of NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tra…☆15Dec 30, 2025Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A toolkit for researchers in the multimodal sound separation.☆16Oct 20, 2023Updated 2 years ago
- GoldFinch and other hybrid transformer components☆13Dec 9, 2025Updated 4 months ago
- [ICMR 2025] Official Repository for The Paper, Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale …☆18Aug 17, 2025Updated 8 months ago
- [CVPR 2026] Official Implementation of "Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models".☆16Feb 23, 2026Updated last month
- Official source code of HELM, a family of fully hyperbolic large language models☆35Apr 4, 2026Updated 2 weeks ago
- 为visinger SVS系统写的展示系统~本质仍然是个音乐播放器☆11Apr 18, 2023Updated 3 years ago
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆111Jun 2, 2025Updated 10 months ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆30Aug 4, 2024Updated last year
- ☆23Apr 2, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [AAAI 2026] Official repository of Circulant Attention☆47Jan 12, 2026Updated 3 months ago
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- code for "Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification"☆10Mar 19, 2022Updated 4 years ago
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 6 months ago
- Library for using deepseek api☆14Jan 28, 2025Updated last year
- Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation (ICML 2024)☆15Jul 19, 2024Updated last year
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- Code for master thesis on Zero-Shot Learning in multi-label scenarios☆14Mar 28, 2018Updated 8 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Sep 23, 2024Updated last year
- Officially unofficial PyTorch code for the NIPS paper 'Natural-Parameter Networks: A Class of Probabilistic Neural Networks'☆11Sep 28, 2021Updated 4 years ago
- Histopathology Feature Extractors (2024)☆14Jun 14, 2024Updated last year
- convert a saved pytorch model to gguf and generate as much corresponding ggml c code as possible☆15Dec 19, 2023Updated 2 years ago
- ☆11Dec 11, 2024Updated last year
- Travelling salesman problem with 3opt move and 2opt perturbation☆23Jan 7, 2019Updated 7 years ago
- The code for paper: "DC-Net: Divide-and-Conquer for Salient Object Detection"☆20Aug 30, 2024Updated last year
- This is the official implementation to the EMNLP 2024 paper: Modeling Layout Reading Order as Ordering Relations for Visually-rich Docume…☆31Jan 19, 2026Updated 3 months ago
- ☆13Jan 11, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- HGRN2: Gated Linear RNNs with State Expansion☆57Aug 20, 2024Updated last year
- This is the our implementation for the paper: Exploring Mixed Information Flow for Cross-domain Sequential Recommendations☆12Aug 17, 2020Updated 5 years ago
- Med-DANet Series (ECCV 2022 & WACV 2024)☆13Jan 2, 2024Updated 2 years ago
- [NeurIPS 2024] Official repository of InLine attention☆59Dec 22, 2024Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆136Oct 15, 2025Updated 6 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆41Aug 29, 2024Updated last year
- ☆14Jul 11, 2023Updated 2 years ago