Implementation of a multimodal diffusion transformer in Pytorch
☆107Jun 24, 2024Updated last year
Alternatives and similar repositories for multimodal-dit-pytorch
Users that are interested in multimodal-dit-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆182Jun 20, 2024Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 5 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46May 23, 2023Updated 2 years ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆74Sep 16, 2024Updated last year
- Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)☆15Jan 26, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆135Nov 1, 2025Updated 4 months ago
- ☆23Oct 15, 2024Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- ☆13Jun 3, 2024Updated last year
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated 2 years ago
- Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch☆55Nov 25, 2024Updated last year
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆59Oct 22, 2023Updated 2 years ago
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30May 31, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆34Nov 29, 2024Updated last year
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- ☆17Feb 16, 2024Updated 2 years ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Sep 23, 2024Updated last year
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Jan 21, 2025Updated last year
- Researchers who published code, models (in some cases), and demo apps (in few cases) along with their SOTA paper☆12Oct 19, 2023Updated 2 years ago
- ☆55Nov 5, 2024Updated last year
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆179Sep 12, 2024Updated last year
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆60Oct 23, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- ☆14May 21, 2024Updated last year
- Universal Notation for Tensor Operations in Python☆476Mar 14, 2026Updated last week
- AgentParse is a high-performance parsing library designed to map various structured data formats (such as Pydantic models, JSON, YAML, an…☆18Oct 13, 2025Updated 5 months ago
- AudioSR-Upsampling (any -> 48kHz)☆42Feb 13, 2024Updated 2 years ago
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Sep 29, 2025Updated 5 months ago
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated last year
- A JAX implementation of the continuous time formulation of Consistency Models☆85Apr 7, 2023Updated 2 years ago
- Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch☆515Dec 20, 2025Updated 3 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆107Jan 17, 2025Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- ☆43May 3, 2024Updated last year
- Explorations into NEAT and some of its derivative research☆33Dec 14, 2025Updated 3 months ago
- FLOPS counter for all your GPU benchmarking needs☆13Aug 8, 2024Updated last year
- Focused on fast experimentation and simplicity☆80Dec 24, 2024Updated last year
- Code for Fast Training of Diffusion Models with Masked Transformers☆423May 15, 2024Updated last year