Sreyan88 / GAMA
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
☆111Updated 2 months ago
Alternatives and similar repositories for GAMA:
Users that are interested in GAMA are comparing it to the libraries listed below
- AudioBench: A Universal Benchmark for Audio Large Language Models☆131Updated last week
- Audio Captioning datasets for PyTorch.☆114Updated 3 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆116Updated 2 months ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆165Updated last month
- Reference-aware automatic speech evaluation toolkit☆144Updated 2 months ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆168Updated 7 months ago
- Versatile Evaluation of Speech and Audio☆160Updated this week
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆138Updated last year
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆127Updated 8 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆80Updated 2 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 9 months ago
- ☆43Updated last month
- Implementation of SoundStorm built upon SpeechTokenizer.☆108Updated last year
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".☆121Updated last month
- UTokyo-SaruLab MOS Prediction System☆152Updated this week
- The open source code for LLM-Codec☆128Updated 6 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆133Updated 5 months ago
- Evaluation Protocol for Large-Scale Zero-Shot TTS Literature☆74Updated 5 months ago
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆228Updated 5 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆129Updated 4 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆131Updated this week
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆111Updated this week
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆140Updated 2 weeks ago
- ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis☆127Updated 5 months ago
- ARCH: Audio Representations benCHmark☆42Updated 6 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆77Updated 2 months ago
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆125Updated 2 months ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Updated last year
- ☆63Updated 5 months ago