☆14May 20, 2025Updated 9 months ago
Alternatives and similar repositories for MM-When2Speak
Users that are interested in MM-When2Speak are comparing it to the libraries listed below
Sorting:
- A dataset of first-person monologue videos/transcript/annotations about "life lessons" in various domains. The main purpose is for multi-…☆17Jan 8, 2025Updated last year
- ☆28Nov 25, 2024Updated last year
- ☆41May 15, 2025Updated 9 months ago
- [NeurIPS 2025] Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation☆34Oct 24, 2025Updated 4 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model (SIGGRAPH 2024)☆38Sep 10, 2024Updated last year
- [ECCV'24] Self-training Room Layout Estimation via Geometry-aware Ray-casting☆15Jan 20, 2025Updated last year
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- ☆12Feb 16, 2024Updated 2 years ago
- Neural Homomorphic Vocoder optimized for singing voice synthesis☆18Mar 2, 2026Updated last week
- ☆14Apr 29, 2025Updated 10 months ago
- Vision Transformer (ViT) models, with their attention mechanisms, revolutionized computer vision. By merging Class Activation Map (CAM) a…☆13Aug 14, 2023Updated 2 years ago
- ☆12Jan 28, 2022Updated 4 years ago
- [ICCV 2023] Code for "Multi-task View Synthesis with Neural Radiance Fields"☆11Oct 2, 2023Updated 2 years ago
- Recognize sudoku problem from an image based on OpenCV and Python. 数独图片识别与提取,基于OpenCV和Python☆12Feb 15, 2019Updated 7 years ago
- A repository for the EMNLP 2021 paper "Is Information Density Uniform in Task-Oriented Dialogues?" and for the CoNLL 2021 paper "Analysin…☆10Jun 17, 2024Updated last year
- [ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception☆14Jul 4, 2025Updated 8 months ago
- This is the official implementation for IVA '19 paper "Analyzing Input and Output Representations for Speech-Driven Gesture Generation".☆10Jul 12, 2022Updated 3 years ago
- Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".☆15Apr 27, 2023Updated 2 years ago
- Official implementation of "NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models"☆18Jun 3, 2025Updated 9 months ago
- Interpreting CLIP with Hierarchical Sparse Autoencoders (ICML 2025)☆21Jan 17, 2026Updated last month
- A PyTorch implementation of the paper "MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis".☆12Jan 16, 2023Updated 3 years ago
- [AAAI 2025] Official pytorch implementation of "Diffusion Model Patching via Mixture-of-Prompts"☆13Dec 12, 2024Updated last year
- dMel: Speech Tokenization Made Simple☆16May 13, 2025Updated 9 months ago
- ☆20Nov 21, 2025Updated 3 months ago
- ☆12Mar 5, 2024Updated 2 years ago
- Official Implementation of implicit reference attack☆11Oct 16, 2024Updated last year
- Exposing Text-Image Inconsistency Using Diffusion Models (ICLR 2024)☆10Jun 15, 2024Updated last year
- ☆10May 24, 2024Updated last year
- ☆13Sep 28, 2024Updated last year
- [🔥ACM MM2025] EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation☆23Dec 30, 2025Updated 2 months ago
- Codebase for "Decoding language spatial relations to 2D spatial arrangements" (Findings of EMNLP 2020).☆11Feb 10, 2023Updated 3 years ago
- [TPAMI 2026] Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation☆11Nov 30, 2025Updated 3 months ago
- ☆13Jul 28, 2024Updated last year
- ☆18May 15, 2025Updated 9 months ago
- [ICLR 2025] This repo is the official implementation of "The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs".☆13Jan 25, 2025Updated last year
- StimulerVoiceX is a denoising and speech enhancement system. It uses deep learning techniques to remove noise from speech signals and imp…☆13Jul 19, 2023Updated 2 years ago
- ☆10Nov 23, 2023Updated 2 years ago
- Code for ACL 2023 main conference paper "Back Translation for Speech-to-text Translation Without Transcripts".☆12Oct 25, 2023Updated 2 years ago