soham97 / mellowLinks
small audio language model for reasoning
ā64Updated last month
Alternatives and similar repositories for mellow
Users that are interested in mellow are comparing it to the libraries listed below
Sorting:
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.ā88Updated 5 months ago
- [Official Implementation] Acoustic Autoregressive Modeling š„ā69Updated 9 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)ā38Updated 11 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)ā27Updated 5 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986ā46Updated 7 months ago
- Official implementation for FlowSepā49Updated 5 months ago
- ā50Updated 2 months ago
- [ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"ā63Updated 4 months ago
- ā98Updated last month
- ā79Updated this week
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformerā58Updated 7 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)ā39Updated 2 months ago
- ā33Updated last month
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipelineā133Updated 5 months ago
- ā23Updated 7 months ago
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluatorsā36Updated 2 weeks ago
- ā113Updated 3 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.ā49Updated 2 months ago
- Source code for DM-Codec.ā43Updated this week
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sā¦ā118Updated this week
- The open source code for SimpleSpeech seriesā138Updated 7 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.ā24Updated 8 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion modelā53Updated last year
- ā47Updated 4 months ago
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesisā16Updated last month
- ā43Updated 11 months ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation ā¦ā67Updated 5 months ago
- š¦ Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)ā50Updated 3 months ago
- SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)ā72Updated 4 months ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generationā38Updated 3 weeks ago