YuanGongND / uavmView external linksLinks
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Apr 20, 2023Updated 2 years ago
Alternatives and similar repositories for uavm
Users that are interested in uavm are comparing it to the libraries listed below
Sorting:
- Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".☆286Mar 20, 2024Updated last year
- experiments about AudioSet☆43Jul 22, 2023Updated 2 years ago
- Phoneme segmentation using pre-trained speech models☆55Nov 4, 2022Updated 3 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- Code for the C2KD paper (ICASSP 2023)☆18May 15, 2023Updated 2 years ago
- Sapsucker Woods 60 Audiovisual Dataset☆17Oct 7, 2022Updated 3 years ago
- Easily turn large sets of audio urls to an audio dataset.☆21Dec 27, 2022Updated 3 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆53Jan 18, 2024Updated 2 years ago
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations☆100Jun 18, 2024Updated last year
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Aug 29, 2024Updated last year
- Cross-model active contrastive coding☆22Mar 17, 2021Updated 4 years ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆26Mar 27, 2024Updated last year
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- ☆43Feb 21, 2023Updated 2 years ago
- ☆10Apr 17, 2024Updated last year
- Interference removal algorithm for multitrack live recordings☆11Jan 9, 2019Updated 7 years ago
- A curated list of audio-visual learning methods and datasets.☆285Dec 3, 2024Updated last year
- The Pytorch implementation of paper: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training☆50Dec 17, 2024Updated last year
- Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)☆12Jun 1, 2023Updated 2 years ago
- Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".☆149Jul 13, 2023Updated 2 years ago
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Jan 23, 2023Updated 3 years ago
- Non-Autoregressive Predictive Coding☆51Nov 3, 2020Updated 5 years ago
- Official repo for DisCoder: High-Fidelity Music Vocoder using Neural Audio Codecs presented at ICASSP 2025☆37Feb 24, 2025Updated 11 months ago
- Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"☆27Feb 22, 2022Updated 3 years ago
- Splits for epic-sounds dataset☆85Aug 2, 2025Updated 6 months ago
- Music Demixing Challenge Submission Repo☆15Sep 8, 2023Updated 2 years ago
- Short-time Fourier transform (STFT) for JAX☆15Dec 20, 2021Updated 4 years ago
- Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)☆17Dec 20, 2022Updated 3 years ago
- 4G GPU & 10 Minutes for train☆12Aug 9, 2023Updated 2 years ago
- MusAV: a dataset of relative arousal-valence annotations for validation of audio models☆17Dec 16, 2022Updated 3 years ago
- Multispeaker Community Vocoder Model for DiffSinger☆39Aug 11, 2025Updated 6 months ago
- VGGSound: A Large-scale Audio-Visual Dataset☆350Sep 13, 2021Updated 4 years ago
- Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)☆89Jul 25, 2024Updated last year
- WildVSR☆21Dec 13, 2023Updated 2 years ago
- ☆13Nov 15, 2024Updated last year
- This is the GitHub repository for Data Augmentation for Saliency Prediction via Latent Diffusion paper in ECCV 2024, Milano, Italy☆14Nov 7, 2024Updated last year
- [NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Spee…☆17Sep 19, 2023Updated 2 years ago