Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
☆22Apr 15, 2024Updated 2 years ago
Alternatives and similar repositories for ClipClap-GZSL
Users that are interested in ClipClap-GZSL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆42Nov 29, 2022Updated 3 years ago
- ☆11Apr 12, 2024Updated 2 years ago
- Rainbow Keywords - Official PyTorch Implementation☆14Jun 27, 2024Updated last year
- Learning Precise Affordances from Egocentric Videos for Robotic Manipulation (ICCV 2025)☆20Jan 30, 2026Updated 2 months ago
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆15Apr 7, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This is the official code repository for the "Gradient-Guided Annealing for Domain Generalization" (CVPR 2025) paper.☆18Jul 22, 2025Updated 8 months ago
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Dec 7, 2023Updated 2 years ago
- An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.☆10May 13, 2020Updated 5 years ago
- to release the source code for reproducing the results reported in our paper: https://arxiv.org/abs/2409.17550☆14Nov 15, 2024Updated last year
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆74Mar 6, 2025Updated last year
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 7 months ago
- ☆23Mar 20, 2024Updated 2 years ago
- Source codes for the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning" (PDMER) which p…☆13Mar 24, 2025Updated last year
- [CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learnin…☆27Apr 10, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆38Oct 11, 2024Updated last year
- GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery (CVPR2025)☆34Mar 31, 2025Updated last year
- ☆14Nov 13, 2023Updated 2 years ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆42Dec 23, 2023Updated 2 years ago
- This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"☆25Sep 12, 2025Updated 7 months ago
- ☆13Oct 30, 2023Updated 2 years ago
- The MAVD represents Mandarin Audio-Visual dataset with Depth information. MAVD has a rich variety of modal data, including audio, RGB ima…☆20Apr 22, 2024Updated last year
- An implementation of http://openaccess.thecvf.com/content_CVPRW_2019/papers/Sight%20and%20Sound/Konstantinos_Vougioukas_End-to-End_Speech…☆18Mar 19, 2020Updated 6 years ago
- This repo contains conv-tasnet for basis-melgan. If you want to get code of basis-melgan, please refer to FastVocoder.☆21Jul 21, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Details of the datasets for Few-shot class-incremental audio classification☆10Dec 6, 2023Updated 2 years ago
- ☆40Apr 14, 2025Updated last year
- ☆25Jul 15, 2024Updated last year
- Code for CLVision workshop (CVPR 2024) paper - Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-train…☆11Nov 12, 2024Updated last year
- ☆14Mar 21, 2025Updated last year
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆26Feb 22, 2024Updated 2 years ago
- This is my speaker recognition implementation based on the x-vector system described in "X-Vectors: Robust DNN Embeddings for Speaker Rec…☆10Nov 3, 2022Updated 3 years ago
- Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)☆24Mar 17, 2025Updated last year
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆23Apr 27, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection☆19Apr 23, 2025Updated 11 months ago
- Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Da…☆27Mar 5, 2026Updated last month
- ☆27Jun 27, 2023Updated 2 years ago
- This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptati…☆128Feb 13, 2025Updated last year
- ☆113Apr 9, 2026Updated last week
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆93Dec 8, 2023Updated 2 years ago
- Official PyTorch implementation of SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy M…☆38Aug 27, 2024Updated last year