umbertocappellazzo / Omni-AVSRLinks
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models".
☆25Updated last month
Alternatives and similar repositories for Omni-AVSR
Users that are interested in Omni-AVSR are comparing it to the libraries listed below
Sorting:
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆42Updated 2 months ago
- ☆78Updated 7 months ago
- Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation☆61Updated 5 months ago
- Music production for silent film clips.☆30Updated 7 months ago
- ☆62Updated 6 months ago
- ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation☆97Updated last week
- The official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Tran…☆122Updated last week
- FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.☆52Updated last week
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆85Updated 2 months ago
- Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos☆25Updated last year
- ☆29Updated 9 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆182Updated last year
- ☆61Updated 5 months ago
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆110Updated last year
- An official implementation of SwapAnyone.☆71Updated 9 months ago
- ☆41Updated 5 months ago
- ☆20Updated last year
- ☆46Updated 8 months ago
- ☆71Updated 2 months ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆29Updated 2 months ago
- Reinforcing Text-Rich Video Reasoning with Visual Rumination☆27Updated 3 weeks ago
- JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment☆115Updated 4 months ago
- Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language☆85Updated last year
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆121Updated 4 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆113Updated 6 months ago
- Official repo for paper "EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture."☆58Updated this week
- [AAAI 2025] VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization☆53Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆80Updated last year
- A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contai…☆41Updated 8 months ago