JeongHun0716 / MMS-LLaMA
Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens"
☆19Updated this week
Alternatives and similar repositories for MMS-LLaMA:
Users that are interested in MMS-LLaMA are comparing it to the libraries listed below
- Official PyTorch implementation for "Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech …☆19Updated 2 weeks ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆37Updated this week
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 8 months ago
- This repository aims to collect Transformer-based sound event detection (SED) algorithms.☆54Updated 2 months ago
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆16Updated 2 weeks ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Updated 2 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.☆120Updated last week
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆31Updated 5 months ago
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆48Updated 9 months ago
- Official release of StyleTalk dataset.☆62Updated 9 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆84Updated 3 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆43Updated 9 months ago
- small audio language model for reasoning☆50Updated last week
- Official implementation of USR (NeurIPS 2024)☆29Updated 3 months ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆34Updated 6 months ago
- ☆27Updated 6 months ago
- ☆54Updated last week
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".☆127Updated this week
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆52Updated 5 months ago
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆14Updated 10 months ago
- A low-bitrate single-codebook 16 kHz speech codec based on focal modulation☆81Updated last month
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization☆48Updated last week
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆85Updated 3 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆113Updated 3 months ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆60Updated 3 months ago
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆33Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆24Updated 9 months ago
- YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection☆16Updated 3 weeks ago
- ☆39Updated last month
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆119Updated 3 months ago