GeWu-Lab/MMCosine_ICASSP23

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GeWu-Lab/MMCosine_ICASSP23)

GeWu-Lab / MMCosine_ICASSP23

The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"

☆26

Alternatives and similar repositories for MMCosine_ICASSP23

Users that are interested in MMCosine_ICASSP23 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

visipedia / ssw60
View on GitHub
Sapsucker Woods 60 Audiovisual Dataset
☆19Oct 7, 2022Updated 3 years ago
ExplainableML / AVCA-GZSL
View on GitHub
This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …
☆43Nov 29, 2022Updated 3 years ago
GeWu-Lab / CSOL_TPAMI2021
View on GitHub
The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.
☆29Mar 4, 2022Updated 4 years ago
GeWu-Lab / OGM-GE_CVPR2022
View on GitHub
The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)
☆320Sep 22, 2025Updated 10 months ago
mmmistgun / Tipdm_Data_Analysis_II
View on GitHub
第二届“泰迪杯”数据分析职业技能大赛A题
☆10Sep 15, 2020Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ExplainableML / TCAF-GZSL
View on GitHub
This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"
☆25Sep 12, 2025Updated 10 months ago
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
sony / CLIPSep
View on GitHub
☆43Feb 21, 2023Updated 3 years ago
Takaaki-Saeki / ssl_speech_restoration_v2
View on GitHub
☆17Dec 18, 2023Updated 2 years ago
GeWu-Lab / LFAV
View on GitHub
Towards Long Form Audio-visual Video Understanding
☆15Jan 16, 2026Updated 6 months ago
W-Wu / DEER
View on GitHub
☆12Aug 25, 2023Updated 2 years ago
cyhuang-tw / robust-vc
View on GitHub
☆11May 7, 2022Updated 4 years ago
usc-sail / mica-subtitle-aligned-movie-sounds
View on GitHub
A dataset for Audio-Visual Sound Event Detection in Movies
☆26Jan 23, 2023Updated 3 years ago
weiguoPian / AV-CIL_ICCV2023
View on GitHub
[ICCV 2023] Audio-Visual Class-Incremental Learning
☆35Sep 29, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
b04901014 / FG-transformer-TTS
View on GitHub
Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis.
☆90Mar 5, 2022Updated 4 years ago
haoyi-duan / DG-SCT
View on GitHub
NeurIPS'2023 official implementation code
☆70Nov 11, 2023Updated 2 years ago
zjlww / dsp
View on GitHub
Digital Speech Processing in PyTorch.
☆15Aug 12, 2022Updated 3 years ago
kaiw7 / STG-CMA
View on GitHub
Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
☆15Apr 13, 2024Updated 2 years ago
PRIS-CV / An-Erudite-FGVC-Model
View on GitHub
Code release for Your “An Erudite Fine-Grained Visual Classification Model (CVPR 2023)"
☆17Jun 2, 2023Updated 3 years ago
TaoRuijie / AVCleanse
View on GitHub
ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'
☆44Oct 31, 2022Updated 3 years ago
ttslr / MonTTS
View on GitHub
☆16Dec 23, 2021Updated 4 years ago
PangzeCheung / Discrete-Probability-Flow
View on GitHub
[NeurIPS 2023] Formulating Discrete Probability Flow Through Optimal Transport
☆21Jan 8, 2024Updated 2 years ago
DTaoo / DMC
View on GitHub
Code for Deep Multimodal Clustering for Unsupervised Audiovisual Learning (CVPR2019)
☆15May 27, 2020Updated 6 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
billzyx / WavBERT
View on GitHub
☆24May 16, 2024Updated 2 years ago
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
UCSC-VLAA / Sight-Beyond-Text
View on GitHub
[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
☆20Sep 15, 2023Updated 2 years ago
Sreyan88 / LAPE
View on GitHub
A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)
☆29Jul 9, 2024Updated 2 years ago
speedyseal / audiosetdl
View on GitHub
Scripts for download AudioSet
☆89Nov 7, 2017Updated 8 years ago
SY-Xuan / RT-MDNet
View on GitHub
An implementation RT-MDNet to support the higher version pytorch 1.0+.
☆18Jul 22, 2019Updated 7 years ago
ttgeng233 / UnAV
View on GitHub
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆73Jan 4, 2026Updated 6 months ago
aispeech-lab / LiMuSE
View on GitHub
PyTorch implementation of LiMuSE
☆33Oct 11, 2022Updated 3 years ago
ZjjConan / VLM-LwEIB
View on GitHub
The official pytorch implemention of our IJCV-2025 paper "Learning with Enriched Inductive Biases for Vision-Language Models".
☆15Jul 6, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TeaPoly / CE-OptimizedLoss
View on GitHub
Optimized loss based on cross-entropy (CE), like MWER (minimum WER) Loss with beam search and negative sampling strategy, Smoothed Max Po…
☆25Oct 11, 2024Updated last year
xcmyz / ConvTasNet4BasisMelGAN
View on GitHub
This repo contains conv-tasnet for basis-melgan. If you want to get code of basis-melgan, please refer to FastVocoder.
☆21Jul 21, 2021Updated 5 years ago
youngseo0526 / X-AVDT
View on GitHub
[CVPR 2026] X-AVDT: Audio-Visual Cross-Attention for Robust Deepfake Detection
☆17Jul 6, 2026Updated 2 weeks ago
DTaoo / Discriminative-Sounding-Objects-Localization
View on GitHub
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)
☆61Jan 19, 2022Updated 4 years ago
mavceleb / mavceleb_baseline
View on GitHub
☆11Nov 5, 2025Updated 8 months ago
thuhcsi / icassp2021-emotion-tts
View on GitHub
Please visit: https://thuhcsi.github.io/icassp2021-emotion-tts/
☆34Mar 17, 2023Updated 3 years ago
baist / SINet
View on GitHub
☆22May 22, 2022Updated 4 years ago