Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
☆23Apr 15, 2024Updated 2 years ago
Alternatives and similar repositories for ClipClap-GZSL
Users that are interested in ClipClap-GZSL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆42Nov 29, 2022Updated 3 years ago
- ☆11Apr 12, 2024Updated 2 years ago
- Rainbow Keywords - Official PyTorch Implementation☆14Jun 27, 2024Updated last year
- Learning Precise Affordances from Egocentric Videos for Robotic Manipulation (ICCV 2025)☆26Jan 30, 2026Updated 4 months ago
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆15Apr 7, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This is the official code repository for the "Gradient-Guided Annealing for Domain Generalization" (CVPR 2025) paper.☆17Jul 22, 2025Updated 10 months ago
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Dec 7, 2023Updated 2 years ago
- An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.☆10May 13, 2020Updated 6 years ago
- to release the source code for reproducing the results reported in our paper: https://arxiv.org/abs/2409.17550☆14Nov 15, 2024Updated last year
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆75Mar 6, 2025Updated last year
- ☆23Mar 20, 2024Updated 2 years ago
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 9 months ago
- Source codes for the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning" (PDMER) which p…☆13Mar 24, 2025Updated last year
- [CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learnin…☆28Apr 10, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The official repository of Qwen-VLA☆574May 29, 2026Updated 2 weeks ago
- GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery (CVPR2025)☆36Mar 31, 2025Updated last year
- ☆14Nov 13, 2023Updated 2 years ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆43Dec 23, 2023Updated 2 years ago
- This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"☆25Sep 12, 2025Updated 9 months ago
- ☆13Oct 30, 2023Updated 2 years ago
- This repo contains conv-tasnet for basis-melgan. If you want to get code of basis-melgan, please refer to FastVocoder.☆21Jul 21, 2021Updated 4 years ago
- The MAVD represents Mandarin Audio-Visual dataset with Depth information. MAVD has a rich variety of modal data, including audio, RGB ima…☆20Apr 22, 2024Updated 2 years ago
- Details of the datasets for Few-shot class-incremental audio classification☆10Dec 6, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- An implementation of http://openaccess.thecvf.com/content_CVPRW_2019/papers/Sight%20and%20Sound/Konstantinos_Vougioukas_End-to-End_Speech…☆18Mar 19, 2020Updated 6 years ago
- ☆41Apr 14, 2025Updated last year
- Code for CLVision workshop (CVPR 2024) paper - Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-train…☆11Nov 12, 2024Updated last year
- ☆27Jul 15, 2024Updated last year
- ☆14Mar 21, 2025Updated last year
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆26Feb 22, 2024Updated 2 years ago
- This is my speaker recognition implementation based on the x-vector system described in "X-Vectors: Robust DNN Embeddings for Speaker Rec…☆10Nov 3, 2022Updated 3 years ago
- Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)☆24Mar 17, 2025Updated last year
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆23Apr 27, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection☆19Apr 23, 2025Updated last year
- Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Da…☆27Mar 5, 2026Updated 3 months ago
- Spiking CNN for object recognition☆12Apr 26, 2017Updated 9 years ago
- imu gps fuse use eskf☆11Jan 12, 2023Updated 3 years ago
- An implement of ORB-SLAM3 with python.☆10Jul 2, 2023Updated 2 years ago
- [AAAI 2024] The official PyTorch implementation of "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation"☆129May 18, 2026Updated last month
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆93Dec 8, 2023Updated 2 years ago