Skyline-9 / Visionary-VidsView external linksLinks
Multi-modal transformer approach for natural language query based joint video summarization and highlight detection
☆17May 23, 2024Updated last year
Alternatives and similar repositories for Visionary-Vids
Users that are interested in Visionary-Vids are comparing it to the libraries listed below
Sorting:
- ☆15Aug 4, 2025Updated 6 months ago
- A computing solution based on deep learning that allows the efficient generation of keyshot type spotlights from videos.☆19Jan 13, 2022Updated 4 years ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆107Jan 23, 2025Updated last year
- Graph learning framework for long-term video understanding☆71Jul 13, 2025Updated 7 months ago
- A PyTorch Implementation of CA-SUM from "Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of …☆31Jun 29, 2022Updated 3 years ago
- 📦 A collection of pastable code gathered from past projects☆12Sep 9, 2024Updated last year
- SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation (ICCV 2025)☆14Sep 26, 2025Updated 4 months ago
- ☆40Apr 16, 2024Updated last year
- Example application for creating an MVC Express + Node + TypeScript app and deploying it to Azure☆10Nov 8, 2018Updated 7 years ago
- cross modal background suppression for audio-visual event localization☆36Mar 18, 2022Updated 3 years ago
- [CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos☆37Jan 29, 2025Updated last year
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆38Jul 31, 2024Updated last year
- UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or …☆234Apr 15, 2024Updated last year
- [NeurIPS 2021] Moment-DETR code and QVHighlights dataset☆342Apr 18, 2024Updated last year
- Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022☆39Feb 17, 2023Updated 2 years ago
- This repository contains the codebase for MovieCLIP: Visual Scene Recognition in Movies☆42Oct 1, 2023Updated 2 years ago
- This is the implementation of the paper Video Summarization by Learning from Unpaired Data(CVPR2019)☆37Sep 5, 2019Updated 6 years ago
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆19Nov 3, 2025Updated 3 months ago
- The implementation codes of paper: Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning☆18May 8, 2025Updated 9 months ago
- The code and data for "Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization"☆11May 16, 2023Updated 2 years ago
- Retrieval Augmented Generation, but no servers involved. Backed by S3☆12Nov 3, 2023Updated 2 years ago
- [ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding☆374May 8, 2024Updated last year
- ☆12Sep 11, 2023Updated 2 years ago
- Experimental webapp for creating music using graphs☆21Aug 27, 2015Updated 10 years ago
- 南开大学网络空间安全学院计算机组成原理2023spring☆13Jan 22, 2024Updated 2 years ago
- A Framework for Symbolic MUsic Graph Explanations☆10Jul 30, 2025Updated 6 months ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated last year
- XState machines for common UI patterns☆13Mar 14, 2020Updated 5 years ago
- ☆14Aug 21, 2017Updated 8 years ago
- ☆11Sep 29, 2023Updated 2 years ago
- ☆10Oct 16, 2025Updated 3 months ago
- ☆13Oct 17, 2020Updated 5 years ago
- Basic rover demo from Raspberry Pi with remote teleop over LiveKit☆15Jul 10, 2025Updated 7 months ago
- Canvas Element Recorder for React, with really simple API☆11Oct 16, 2023Updated 2 years ago
- Multimodal SER Model meant to be trained on recognising emotions from speech (text + acoustic data). Fine-tuned the DeBERTaV3 model, resp…☆11Jun 19, 2024Updated last year
- Firebase application template built on moltres framework☆12Apr 17, 2023Updated 2 years ago
- Awesome Multimodal Fusion in Speech Emotion Recognition☆13Nov 11, 2025Updated 3 months ago
- Adding MIDI to sheet music SVG. A project for the Music Encoding Initiative (MEI)☆10Updated this week
- For ACL25 paper "WAFFLE: Multi-Modal Model for Automated Front-End Development" - by Shanchao Liang and Nan Jiang and Shangshu Qian and L…☆11May 28, 2025Updated 8 months ago