Ming-er / LGC-SED
☆11Updated last year
Alternatives and similar repositories for LGC-SED:
Users that are interested in LGC-SED are comparing it to the libraries listed below
- official implementation of MGA-CLAP (ACM MM 2024)☆12Updated 3 months ago
- ☆16Updated 2 months ago
- Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection☆16Updated 5 months ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆41Updated 2 weeks ago
- Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessing☆34Updated 5 months ago
- baseline for IEEE ICME 2024 GC: Semi-supervised Acoustic Scene Classification under Domain Shift☆17Updated 10 months ago
- Baseline method for audio-visual sound event localization and detection task of DCASE 2023 challenge☆50Updated last year
- This repository collects papers related to Speech Tokenizer.☆15Updated 3 months ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'☆43Updated 2 years ago
- Source code for the paper 'Audio Captioning Transformer'☆52Updated 3 years ago
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆38Updated 3 months ago
- MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection☆8Updated 6 months ago
- This package aims at simplifying the download of the AudioCaps dataset.☆31Updated last year
- Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)☆16Updated 2 years ago
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆18Updated last year
- Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"☆16Updated last year
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…☆44Updated 10 months ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆29Updated last year
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆123Updated last month
- ☆22Updated 3 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆112Updated last month
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆49Updated 3 weeks ago
- [CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learnin…☆23Updated last year
- ☆24Updated last year
- ☆15Updated 2 years ago
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Updated 2 years ago
- CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection (ICASSP 2024)☆19Updated 3 weeks ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆19Updated this week
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆72Updated 2 months ago