sony/CLIPSep

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sony/CLIPSep)

sony / CLIPSep

☆43

Alternatives and similar repositories for CLIPSep

Users that are interested in CLIPSep are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Aisaka0v0 / CLAPSep
View on GitHub
Query-conditioned target sound extraction model
☆30Mar 25, 2025Updated last year
haiciyang / Remixing
View on GitHub
Official repo of ICASSP 2022 paper - Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint Optimization
☆20Jan 7, 2025Updated last year
akoepke / audio-retrieval-benchmark
View on GitHub
Code for "Audio Retrieval with Natural Language Queries: A Benchmark Study", Transactions on Multimedia 2022
☆54Jul 16, 2025Updated last year
XinhaoMei / ACT
View on GitHub
Source code for the paper 'Audio Captioning Transformer'
☆56Jan 18, 2022Updated 4 years ago
lavendery / AudioComposer
View on GitHub
☆27Sep 10, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
liuxubo717 / LASS
View on GitHub
This repo hosts the code and model of "Separate What You Describe: Language-Queried Audio Source Separation", Interspeech 2022
☆146Oct 11, 2023Updated 2 years ago
NKU-HLT / AudioEditor
View on GitHub
☆47Apr 2, 2025Updated last year
rxtan2 / AVSeT
View on GitHub
☆17Oct 2, 2023Updated 2 years ago
usc-sail / mica-subtitle-aligned-movie-sounds
View on GitHub
A dataset for Audio-Visual Sound Event Detection in Movies
☆26Jan 23, 2023Updated 3 years ago
XinhaoMei / WavCaps
View on GitHub
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆264Jul 25, 2024Updated last year
GeWu-Lab / MMCosine_ICASSP23
View on GitHub
The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"
☆26May 18, 2023Updated 3 years ago
RetroCirce / Zero_Shot_Audio_Source_Separation
View on GitHub
The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022
☆212Jul 14, 2022Updated 4 years ago
jczhang02 / MUSIC_dataset_script
View on GitHub
This repo contains script to download MUSIC dataset from youtube
☆12Jan 19, 2024Updated 2 years ago
wsntxxn / TextToAudioGrounding
View on GitHub
The dataset and baseline code for Text-to-Audio Grounding (TAG)
☆49Oct 23, 2025Updated 8 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
haoheliu / audioldm_eval
View on GitHub
This toolbox aims to unify audio generation model evaluation for easier comparison.
☆390Sep 29, 2024Updated last year
bytedance / uss
View on GitHub
This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.
☆368Sep 1, 2023Updated 2 years ago
speedyseal / audiosetdl
View on GitHub
Scripts for download AudioSet
☆89Nov 7, 2017Updated 8 years ago
roudimit / MUSIC_dataset
View on GitHub
MUSIC Dataset from The Sound of Pixels (ECCV '18)
☆137Aug 12, 2022Updated 3 years ago
YuanGongND / uavm
View on GitHub
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Apr 20, 2023Updated 3 years ago
sangho-vision / acav100m
View on GitHub
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
☆64Nov 18, 2021Updated 4 years ago
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
facebookresearch / AudioMAE
View on GitHub
This repo hosts the code and models of "Masked Autoencoders that Listen".
☆671Apr 5, 2024Updated 2 years ago
haoxiangsnr / llm-tse
View on GitHub
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)
☆43Oct 13, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
WikiChao / DAVIS
View on GitHub
[🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …
☆33Mar 30, 2026Updated 3 months ago
neillu23 / CDiffuSE
View on GitHub
Conditional Diffusion Probabilistic Model for Speech Enhancement
☆251Dec 20, 2022Updated 3 years ago
nttcslab / dcase2025_task4_baseline
View on GitHub
☆18Apr 16, 2026Updated 3 months ago
cdjkim / audiocaps
View on GitHub
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
☆215Oct 6, 2025Updated 9 months ago
hkchengrex / av-benchmark
View on GitHub
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…
☆79Feb 14, 2026Updated 5 months ago
etzinis / optimal_condition_training
View on GitHub
Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…
☆14Feb 15, 2023Updated 3 years ago
haoheliu / AudioLDM-training-finetuning
View on GitHub
AudioLDM training, finetuning, evaluation and inference.
☆304Dec 13, 2024Updated last year
hche11 / Localizing-Visual-Sounds-the-Hard-Way
View on GitHub
Localizing Visual Sounds the Hard Way
☆84Jul 6, 2022Updated 4 years ago
jaeyeonkim99 / EnCLAP
View on GitHub
Official Implementation of EnCLAP (ICASSP 2024)
☆96Jun 2, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
PapayaResearch / ctag
View on GitHub
[ICML'24] Creative Text-to-Audio Generation via Synthesizer Programming
☆41Sep 26, 2024Updated last year
visipedia / ssw60
View on GitHub
Sapsucker Woods 60 Audiovisual Dataset
☆19Oct 7, 2022Updated 3 years ago
luosiallen / Diff-Foley
View on GitHub
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆205May 29, 2024Updated 2 years ago
qiuqiangkong / materials_for_students
View on GitHub
☆16Aug 10, 2025Updated 11 months ago
hche11 / VGGSound
View on GitHub
VGGSound: A Large-scale Audio-Visual Dataset
☆359Sep 13, 2021Updated 4 years ago
ericwudayi / SkipVQVC
View on GitHub
An implementation of SkipVQVC with various settings.
☆75Jun 22, 2020Updated 6 years ago
JinhuaLiang / lam4fsl
View on GitHub
An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"
☆31May 31, 2023Updated 3 years ago