cai525 / Transformer4SED
This repository aims to collect Transformer-based sound event detection (SED) algorithms.
☆35Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for Transformer4SED
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆99Updated this week
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆35Updated 3 months ago
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆32Updated last month
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆103Updated 3 weeks ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆28Updated 3 weeks ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆39Updated last week
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆62Updated 2 weeks ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆77Updated 3 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆79Updated last month
- ☆34Updated last month
- Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language☆61Updated 5 months ago
- AudioBench: A Universal Benchmark for Audio Large Language Models☆90Updated last month
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆189Updated last month
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆55Updated last week
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆113Updated 6 months ago
- ☆11Updated this week
- Source code for the paper 'Audio Captioning Transformer'☆50Updated 2 years ago
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆147Updated 2 months ago
- Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆102Updated last month
- Audio captioning recipe☆43Updated 4 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆57Updated 2 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆44Updated 2 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 5 months ago
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆28Updated last year
- ☆21Updated last month
- ☆125Updated 3 weeks ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated 2 months ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆47Updated 6 months ago
- Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Mo…☆19Updated last year
- Learning differentiable temporal resolution on time-series data.☆32Updated 2 years ago