JeongHun0716 / MMS-LLaMALinks
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens" (ACL 2025 Findings)
☆32Updated 3 months ago
Alternatives and similar repositories for MMS-LLaMA
Users that are interested in MMS-LLaMA are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆31Updated 4 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Updated 7 months ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆181Updated last month
- Official implementation of USR (NeurIPS 2024)☆33Updated 9 months ago
- ☆108Updated 2 weeks ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆43Updated 10 months ago
- Official release of StyleTalk dataset.☆69Updated last year
- small audio language model for reasoning☆74Updated 5 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆49Updated last year
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆55Updated 5 months ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆43Updated this week
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆43Updated 5 months ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆37Updated last year
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆36Updated 3 weeks ago
- VoiceLDM: Text-to-Speech with Environmental Context☆184Updated last year
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆90Updated 10 months ago
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆93Updated 9 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆78Updated 11 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆35Updated last month
- Official Implementation of EnCLAP (ICASSP 2024)☆94Updated last year
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆117Updated 9 months ago
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆38Updated last year
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization☆57Updated 5 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆142Updated 9 months ago
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆78Updated 4 months ago
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆152Updated 9 months ago
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆278Updated 5 months ago
- ☆36Updated 5 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.☆188Updated 2 months ago
- [ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".☆32Updated 2 months ago