PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
☆86Oct 27, 2024Updated last year
Alternatives and similar repositories for Differential-Transformer-PyTorch
Users that are interested in Differential-Transformer-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open source community implementation of the model from "DIFFERENTIAL TRANSFORMER" paper by Microsoft.☆41Apr 20, 2026Updated 2 weeks ago
- official repo for `thinking with images through-self-calling`☆26Dec 28, 2025Updated 4 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆35Aug 14, 2024Updated last year
- [NeurIPS 2025] Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM☆24Feb 10, 2026Updated 3 months ago
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆19Jul 16, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Exquisite video generation☆14Feb 18, 2024Updated 2 years ago
- A toolkit for researchers in the multimodal sound separation.☆16Oct 20, 2023Updated 2 years ago
- GoldFinch and other hybrid transformer components☆13Dec 9, 2025Updated 5 months ago
- [ICMR 2025] Official Repository for The Paper, Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale …☆18Aug 17, 2025Updated 8 months ago
- [CVPR 2026] Official Implementation of "Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models".☆17Feb 23, 2026Updated 2 months ago
- An easy-to-use PyTorch library for Pathology Image ANalysis Orchestrator (PIANO), including generating patches from whole slide images, u…☆43Aug 24, 2025Updated 8 months ago
- A PyTorch implementation of Determinantal Point Process Likelihoods for Sequential Recommendation☆12Dec 9, 2024Updated last year
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆112Jun 2, 2025Updated 11 months ago
- Official repo of Future-aware Diverse Trends Framework for Recommendation☆11Jul 22, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆30Aug 4, 2024Updated last year
- [AAAI 2026] Official repository of Circulant Attention☆52Jan 12, 2026Updated 3 months ago
- noise reduction☆17Jul 3, 2024Updated last year
- Entity-Aware and Motion-Aware Transformers for Language-driven Action Localization(IJCAI-22)☆12Oct 11, 2022Updated 3 years ago
- [CVPR'24] Solving the Catastrophic Forgetting Problem in Generalized Category Discovery https://arxiv.org/pdf/2501.05272☆16Dec 24, 2024Updated last year
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- code for "Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification"☆10Mar 19, 2022Updated 4 years ago
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 6 months ago
- Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation (ICML 2024)☆15Jul 19, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- ☆16Jan 12, 2023Updated 3 years ago
- convert a saved pytorch model to gguf and generate as much corresponding ggml c code as possible☆15Dec 19, 2023Updated 2 years ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Sep 23, 2024Updated last year
- Histopathology Feature Extractors (2024)☆14Jun 14, 2024Updated last year
- Train toy models using multi-token prediction objective☆14Apr 18, 2026Updated 3 weeks ago
- ☆24Sep 25, 2024Updated last year
- Implementation for paper Automata Extraction from Transformers.☆12Jun 8, 2024Updated last year
- ☆11Dec 11, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13Apr 26, 2026Updated last week
- HGRN2: Gated Linear RNNs with State Expansion☆57Aug 20, 2024Updated last year
- ☆18Aug 18, 2024Updated last year
- ☆12Mar 30, 2021Updated 5 years ago
- Manually construct IP, TCP, UDP, and ICMP packets based on DPDK, commonly used for packet simulation, network security attack testing, fi…☆10Nov 29, 2024Updated last year
- Implementation of "Time Interval-enhanced Graph Neural Network for Shared-account Cross-domain Sequential Recommendation" (TNNLs 2022)☆12Mar 21, 2023Updated 3 years ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆137Apr 28, 2026Updated last week