Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models
☆22Apr 15, 2024Updated last year
Alternatives and similar repositories for ClipClap-GZSL
Users that are interested in ClipClap-GZSL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆42Nov 29, 2022Updated 3 years ago
- ☆11Apr 12, 2024Updated last year
- Rainbow Keywords - Official PyTorch Implementation☆13Jun 27, 2024Updated last year
- Learning Precise Affordances from Egocentric Videos for Robotic Manipulation (ICCV 2025)☆20Jan 30, 2026Updated last month
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆14Apr 7, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.☆10May 13, 2020Updated 5 years ago
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆25Dec 7, 2023Updated 2 years ago
- to release the source code for reproducing the results reported in our paper: https://arxiv.org/abs/2409.17550☆14Nov 15, 2024Updated last year
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆74Mar 6, 2025Updated last year
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 7 months ago
- ☆22Mar 20, 2024Updated 2 years ago
- Source codes for the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning" (PDMER) which p…☆13Mar 24, 2025Updated last year
- [CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learnin…☆27Apr 10, 2023Updated 2 years ago
- GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery (CVPR2025)☆33Mar 31, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆38Oct 11, 2024Updated last year
- ☆14Nov 13, 2023Updated 2 years ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆42Dec 23, 2023Updated 2 years ago
- This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"☆25Sep 12, 2025Updated 6 months ago
- ☆13Oct 30, 2023Updated 2 years ago
- The MAVD represents Mandarin Audio-Visual dataset with Depth information. MAVD has a rich variety of modal data, including audio, RGB ima…☆20Apr 22, 2024Updated last year
- An implementation of http://openaccess.thecvf.com/content_CVPRW_2019/papers/Sight%20and%20Sound/Konstantinos_Vougioukas_End-to-End_Speech…☆18Mar 19, 2020Updated 6 years ago
- Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)☆23Mar 17, 2025Updated last year
- Details of the datasets for Few-shot class-incremental audio classification☆11Dec 6, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repo contains conv-tasnet for basis-melgan. If you want to get code of basis-melgan, please refer to FastVocoder.☆21Jul 21, 2021Updated 4 years ago
- ☆40Apr 14, 2025Updated 11 months ago
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆22Apr 27, 2024Updated last year
- ☆24Jul 15, 2024Updated last year
- Code for CLVision workshop (CVPR 2024) paper - Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-train…☆11Nov 12, 2024Updated last year
- Text Clustering as Classification with LLMs☆16Oct 2, 2025Updated 5 months ago
- ☆14Mar 21, 2025Updated last year
- This is my speaker recognition implementation based on the x-vector system described in "X-Vectors: Robust DNN Embeddings for Speaker Rec…☆10Nov 3, 2022Updated 3 years ago
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆26Feb 22, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Da…☆26Mar 5, 2026Updated 3 weeks ago
- ☆110Updated this week
- ☆27Jun 27, 2023Updated 2 years ago
- An implement of ORB-SLAM3 with python.☆10Jul 2, 2023Updated 2 years ago
- Official PyTorch implementation of SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy M…☆37Aug 27, 2024Updated last year
- This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptati…☆127Feb 13, 2025Updated last year
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆93Dec 8, 2023Updated 2 years ago