alibaba / alimama-video-narratorView external linksLinks
Research code for ACL2024 paper: "Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline"
☆41Dec 27, 2024Updated last year
Alternatives and similar repositories for alimama-video-narrator
Users that are interested in alimama-video-narrator are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- ☆60Jun 20, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- 学生选课系统☆11Mar 1, 2023Updated 2 years ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- [NeurIPS'23] The official implementation of paper "Bitstream-corrupted Video Recovery: A Novel Benchmark Dataset and Method"☆42Jul 25, 2025Updated 6 months ago
- Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding☆61Sep 1, 2025Updated 5 months ago
- Code and data for the paper "Steering Conversational Large Language Models for Long Emotional Support Conversations" along with a UI to v…☆14Apr 14, 2025Updated 10 months ago
- [ACL 2025] Official code for ''Learning to Reason from Feedback at Test-Time''.☆13May 16, 2025Updated 9 months ago
- [ACM MM 2023] The released code of paper "Deconfounded Visual Question Generation with Causal Inference"☆11Sep 3, 2024Updated last year
- Official Code of ICCV 2021 Paper: Learning to Cut by Watching Movies☆50Nov 9, 2022Updated 3 years ago
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆64Jan 27, 2026Updated 3 weeks ago
- ☆11Aug 27, 2024Updated last year
- BookWorm: A Dataset for Character Description and Analysis [EMNLP Findings 2024]☆14Feb 28, 2025Updated 11 months ago
- ☆14Dec 25, 2024Updated last year
- ☆15Oct 23, 2023Updated 2 years ago
- ☆13May 21, 2024Updated last year
- This is official repository of Physics-AD☆18Jun 3, 2025Updated 8 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 3 months ago
- NightSurveillance Sataset for Pedestrian Detection☆11Jul 30, 2020Updated 5 years ago
- ☆11Nov 14, 2024Updated last year
- A digital twin of the city of Chicago along with automated sensors☆12Nov 14, 2019Updated 6 years ago
- Repo for the walking robot's vision based navigation code☆10Jun 6, 2023Updated 2 years ago
- UNet-Pruning b developing NNI☆10Sep 2, 2020Updated 5 years ago
- ☆19Aug 7, 2025Updated 6 months ago
- MXNet-Gluon model to Caffe (support SSD in gluoncv)☆10Jun 20, 2019Updated 6 years ago
- This is the official Pytorch code for our paper "Artemis: Structured Visual Reasoning for Perception Policy Learning".☆14Dec 4, 2025Updated 2 months ago
- Multi-Person Tracking in Tour Guide Robot☆10Aug 23, 2022Updated 3 years ago
- ☆10Nov 28, 2023Updated 2 years ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- VideoAuteur: Towards Long Narrative Video Generation☆43Oct 22, 2025Updated 3 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆55Oct 10, 2024Updated last year
- ☆17Nov 22, 2025Updated 2 months ago
- Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding☆19May 5, 2025Updated 9 months ago
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Finding]"☆15Aug 27, 2025Updated 5 months ago
- Click this --> https://zsdonghao.github.io☆10Feb 5, 2026Updated last week
- Traffic density estimation through traffic monitoring cameras.☆11May 21, 2023Updated 2 years ago
- A fully cuda implementation of DCNv2(deformable convolution) forward. Without dependent of cuTorch(THC).☆10Dec 9, 2019Updated 6 years ago