This repository contains the data (datasets, video/user summaries, CUS evaluation, and results) from the paper "VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method." We created the repository in 2011 at (inactive) Google sites.
☆16Oct 13, 2024Updated last year
Alternatives and similar repositories for vsumm
Users that are interested in vsumm are comparing it to the libraries listed below
Sorting:
- Code for skeleton image representations based on spatial structure of the skeleton joints (AVSS 2019 and SIBGRAPI 2019).☆44Feb 4, 2020Updated 6 years ago
- Wav2vec resources and models for Brazilian Portuguese☆37Jul 15, 2022Updated 3 years ago
- A Federated Learning Method for Real-time Emotion State Classification from Multi-modal Streaming☆11Sep 15, 2022Updated 3 years ago
- ☆43Dec 1, 2025Updated 3 months ago
- Repo for the IDESSAI 2024 course on modeling audio with discrete tokens.☆13Sep 13, 2024Updated last year
- ☆12Jun 1, 2024Updated last year
- A Model (maybe an app) that translates the audio of a video from one language to another language, cloning the voice of original video wi…☆15May 19, 2025Updated 9 months ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Mar 24, 2023Updated 2 years ago
- Python library to write, read, and verify transparency metadata in audio files for AI transparency compliance.☆19Aug 17, 2025Updated 6 months ago
- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arx…☆12Feb 6, 2023Updated 3 years ago
- Researchers who published code, models (in some cases), and demo apps (in few cases) along with their SOTA paper☆12Oct 19, 2023Updated 2 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- Awesome Multimodal Fusion in Speech Emotion Recognition☆13Nov 11, 2025Updated 3 months ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 8 months ago
- Statistical test for bias in unsupervised image representations.☆12Mar 8, 2021Updated 5 years ago
- ☆13Jan 12, 2021Updated 5 years ago
- An example integration between Flask and the Preact front end library.☆13Jun 20, 2022Updated 3 years ago
- PorSimplesSent - A Portuguese corpus of aligned sentences pairs to investigate sentence readability assessment☆13Jan 15, 2020Updated 6 years ago
- Adversarially Learned Inference implemented with Keras2☆12Aug 1, 2019Updated 6 years ago
- FINALLY: Fast and universal speech enhancement model delivering studio-quality audio for a wide range of recordings.☆25Dec 11, 2025Updated 2 months ago
- Code for "Can We Characterize Tasks Without Labels or Features?" (CVPR 2021)☆11Aug 31, 2021Updated 4 years ago
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆13Feb 4, 2026Updated last month
- 👄🇧🇷 Alinhamento fonético forçado em Português Brasileiro☆12Jul 18, 2025Updated 7 months ago
- Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"☆13Oct 31, 2024Updated last year
- SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech☆11Jun 30, 2023Updated 2 years ago
- A chinese singing voice dataset, professional male singer, 105 songs, 132 minutes☆11Oct 19, 2023Updated 2 years ago
- ☆13Jan 5, 2025Updated last year
- ☆14Jun 16, 2023Updated 2 years ago
- ☆11Feb 13, 2020Updated 6 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- PyTorch implementation of Listen, Attend and Spell (LAS) speech recognition paper☆12Mar 4, 2022Updated 4 years ago
- Codebase for 'A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance', ICASSP 2024☆13Oct 4, 2024Updated last year
- Pytorch implementation of the paper : A Global-local Attention Framework for Weakly Labelled Audio Tagging.☆13Feb 6, 2021Updated 5 years ago
- ☆12Feb 9, 2021Updated 5 years ago
- Simple tool for speech dataset augmentation for modeling various prosodies.☆14Jan 14, 2021Updated 5 years ago
- Trustworthy Speech Emotion Recognition☆13May 22, 2023Updated 2 years ago
- DysfluentWFST☆18Nov 13, 2025Updated 3 months ago
- Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech☆11May 14, 2025Updated 9 months ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year