sandraavila / vsummLinks
This repository contains the data (datasets, video/user summaries, CUS evaluation, and results) from the paper "VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method." We created the repository in 2011 at (inactive) Google sites.
☆16Updated last year
Alternatives and similar repositories for vsumm
Users that are interested in vsumm are comparing it to the libraries listed below
Sorting:
- [ICASSP 2023] FedAudio: A Federated Learning Benchmark for Audio and Speech Tasks☆51Updated last year
- Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".☆54Updated 6 months ago
- Awesome Multimodal Fusion in Speech Emotion Recognition☆13Updated 3 months ago
- [AAAI 2024] XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning.☆15Updated last year
- Details of the datasets for Few-shot class-incremental audio classification☆11Updated 2 years ago
- ☆14Updated 10 months ago
- [AAAI 2023 (Oral)] CrissCross: Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity☆25Updated 2 years ago
- Can audio-visual integration strengthen robustness under multimodal attacks?☆29Updated 3 years ago
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆25Updated 2 years ago
- A curated list of awesome adversarial reprogramming and input prompting methods for neural networks since 2022☆38Updated 2 years ago
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Updated 10 months ago
- Sapsucker Woods 60 Audiovisual Dataset☆17Updated 3 years ago
- [INTERSPEECH 2023] Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling☆25Updated 3 years ago
- WildDESED: A LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection☆17Updated last year
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆18Updated last year
- (SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition☆13Updated last year
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆53Updated 3 years ago
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆38Updated last year
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆16Updated last year
- ☆13Updated last year
- Code for the C2KD paper (ICASSP 2023)☆18Updated 2 years ago
- Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessing☆45Updated last year
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆53Updated 2 years ago
- Multi-Scale Attention for Audio Question Answering☆28Updated 2 years ago
- [ACII 2023] PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Spe…☆60Updated last year
- Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.☆68Updated 6 months ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'☆50Updated 3 years ago
- Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)☆68Updated 2 years ago
- Official implementation of FOP method as described in "Fusion and Orthogonal Projection for Improved Face-Voice Association"☆20Updated last month
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Updated last year