cai525 / Transformer4SED
This repository aims to collect Transformer-based sound event detection (SED) algorithms.
☆36Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for Transformer4SED
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆29Updated last month
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆101Updated 2 weeks ago
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆104Updated last month
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆33Updated last month
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 3 months ago
- ☆13Updated 2 weeks ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆67Updated last week
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆81Updated this week
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆67Updated 2 weeks ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆57Updated 3 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆80Updated 3 months ago
- ☆35Updated last month
- Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆104Updated last month
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆39Updated 2 weeks ago
- AudioBench: A Universal Benchmark for Audio Large Language Models☆94Updated last week
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆150Updated 3 months ago
- Official code of ElasticAST (Interspeech 2024 paper)☆23Updated 3 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 5 months ago
- ☆17Updated last month
- Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection☆16Updated 3 months ago
- Learning differentiable temporal resolution on time-series data.☆33Updated 2 years ago
- A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)☆26Updated 4 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆48Updated last month
- Audio captioning recipe☆44Updated last week
- ARCH: Audio Representations benCHmark☆38Updated 2 months ago
- Official implementation for our paper "Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations"☆31Updated 5 months ago
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆193Updated last month
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆28Updated last year
- Source code for the paper 'Audio Captioning Transformer'☆50Updated 2 years ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated 2 months ago