Sreyan88 / GAMA
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
☆62Updated last month
Related projects: ⓘ
- AudioBench: A Universal Benchmark for Audio Large Language Models☆61Updated 2 weeks ago
- The open source code for LLM-Codec☆106Updated last month
- Audio Large Language Models☆59Updated this week
- EMO-SUPERB submission☆27Updated 2 weeks ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆81Updated last month
- ☆37Updated 3 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆88Updated 3 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆64Updated 3 weeks ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆38Updated last week
- Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆75Updated 2 weeks ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆127Updated last year
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆112Updated 3 weeks ago
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆169Updated 3 weeks ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆50Updated 10 months ago
- Official release of StyleTalk dataset.☆53Updated 2 months ago
- Code for paper A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing☆85Updated last week
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆80Updated 11 months ago
- Audio Captioning datasets for PyTorch.☆98Updated 2 weeks ago
- ☆50Updated last year
- ☆49Updated this week
- ☆22Updated 2 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆52Updated 3 weeks ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆39Updated last week
- Evaluation Protocol for Large-Scale Zero-Shot TTS Literature☆44Updated last month
- Unofficial download repository for MusicCaps☆41Updated last year
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆28Updated 3 months ago
- Reference-aware automatic speech evaluation toolkit☆95Updated 6 months ago
- SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆34Updated 2 months ago
- Robust Singing Voice Transcription and MIDI Extraction☆47Updated last month
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆25Updated 4 months ago