This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"
☆25Sep 12, 2025Updated 8 months ago
Alternatives and similar repositories for TCAF-GZSL
Users that are interested in TCAF-GZSL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆42Nov 29, 2022Updated 3 years ago
- [NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords☆19Nov 30, 2024Updated last year
- CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations☆30Oct 27, 2023Updated 2 years ago
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆25May 18, 2023Updated 3 years ago
- Sapsucker Woods 60 Audiovisual Dataset☆18Oct 7, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations @ ICCV21☆13Jul 15, 2022Updated 3 years ago
- ☆18Oct 5, 2024Updated last year
- The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.☆31Jan 29, 2024Updated 2 years ago
- ☆31Sep 20, 2021Updated 4 years ago
- Implementation of "Audio Retrieval with Natural Language Queries", INTERSPEECH 2021, PyTorch☆26Aug 18, 2023Updated 2 years ago
- The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.☆29Mar 4, 2022Updated 4 years ago
- My system for the DCASE 2022 Task 3 Sound Event Localizaiton and Detection.☆12Nov 12, 2022Updated 3 years ago
- Code of "What Images are More Memorable to Machines?"☆15Feb 13, 2023Updated 3 years ago
- 🤫A Lightweight One-Shot Whisper to Normal Voice Conversion Model Using Distillation of Self-Supervised Features☆23Dec 10, 2025Updated 6 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Ambiscaper: a tool for automatic dataset generation and annotation of reverberant Ambisonics audio. Originally forked from http://github.…☆22Sep 14, 2018Updated 7 years ago
- PrimeK-Net official code☆27Mar 5, 2025Updated last year
- Langchain_CrewAI_Gemini - An Gemini AI powered AI Agent (Multi-Agent) Project.☆14Mar 24, 2024Updated 2 years ago
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.☆64Nov 18, 2021Updated 4 years ago
- Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.☆10Nov 29, 2023Updated 2 years ago
- ☆17Nov 5, 2020Updated 5 years ago
- Pytorch implementation of NASA: NEURAL ARTICULATED SHAPE APPROXIMATION☆12May 4, 2021Updated 5 years ago
- Pytorch implemenation of "Learning Filter Basis for Convolutional Neural Network Compression" ICCV2019☆18Jan 17, 2026Updated 4 months ago
- Paper list of compositional zero-shot learning☆11Jul 5, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Implementation of "Knowing your FATE: Friendship, Action and Temporal Explanations for User Engagement Prediction on Social Apps"☆12Feb 21, 2020Updated 6 years ago
- Code for the paper: "Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information"☆21Oct 10, 2021Updated 4 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆54Mar 30, 2022Updated 4 years ago
- An implementation of tone enhancement. May refer to "Two-scale Tone Management for Photographic Look", SIGGRAPH 2006.☆13Mar 30, 2017Updated 9 years ago
- Code for "A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models"☆17Jul 20, 2025Updated 10 months ago
- Parse Symcat (http://www.symcat.com) symptoms and conditions and generate valid Synthea (https://github.com/synthetichealth/synthea) modu…☆16Jan 28, 2021Updated 5 years ago
- personalized-llms with allen institute☆13Jun 22, 2023Updated 2 years ago
- [BMVC 2021]: Official PyTorch implementation of : "Few Shot Temporal Action Localization using Query Adaptive Transformers"☆21Jul 12, 2022Updated 3 years ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆42Apr 10, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Hospital simulator with pedestrians and robot☆15Oct 20, 2024Updated last year
- Speaker Diarization using GRU in PyTorch☆11Aug 29, 2020Updated 5 years ago
- Official page of "DeFTAN-II: Efficient multichannel speech enhancement with subgroup processing", IEEE/ACM Transactions on Audio, Speech,…☆33Nov 21, 2024Updated last year
- ☆14May 12, 2025Updated last year
- Image2Video 기반 나만의 움직이는 이모티콘 생성☆12Jan 23, 2021Updated 5 years ago
- code for COLING paper "A Hybrid Model of Classification and Generation for Spatial Relation Extraction"☆10Oct 20, 2022Updated 3 years ago
- This repository contains the code for our CVPR 2022 paper on "Integrating Language Guidance into Vision-based Deep Metric Learning".☆44Aug 9, 2022Updated 3 years ago