Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".
☆40Jan 4, 2026Updated 3 months ago
Alternatives and similar repositories for MM-VID
Users that are interested in MM-VID are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆27Apr 4, 2026Updated 3 weeks ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆44Nov 26, 2024Updated last year
- ☆20Sep 19, 2023Updated 2 years ago
- An in-context learning research testbed☆19Mar 16, 2025Updated last year
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆56Mar 31, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Using distilled CLIP model to deploy the android device☆20Feb 28, 2023Updated 3 years ago
- [NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"☆44Oct 19, 2025Updated 6 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- [ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"☆16May 24, 2025Updated 11 months ago
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆59Feb 4, 2026Updated 2 months ago
- ☆13Feb 25, 2025Updated last year
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)☆22Mar 29, 2026Updated last month
- EraseAnything, ICML 2025☆40Sep 28, 2025Updated 7 months ago
- ☆11May 24, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 10 months ago
- Cancer Detection App! This app is designed to assist in the early detection of skin cancer using cutting-edge AI technology. It's develop…☆10Jan 18, 2024Updated 2 years ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆56Mar 9, 2025Updated last year
- ☆13May 13, 2025Updated 11 months ago
- ☆25Jan 12, 2026Updated 3 months ago
- Generate Python docstrings automatically with LLM and syntax trees☆20Jun 13, 2025Updated 10 months ago
- ☆12Apr 25, 2025Updated last year
- ☆36Jan 9, 2026Updated 3 months ago
- ☆24Jun 18, 2025Updated 10 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆181Feb 25, 2025Updated last year
- [ICLR'26] SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models☆39Mar 9, 2026Updated last month
- [ICLR'26] Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?☆52Mar 9, 2026Updated last month
- ☆56Nov 21, 2024Updated last year
- ☆15Apr 25, 2023Updated 3 years ago
- A Toolkit for Video Action Recognition(Classification/Detection)☆17Mar 23, 2022Updated 4 years ago
- NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation☆13May 24, 2025Updated 11 months ago
- COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark☆15Aug 22, 2024Updated last year
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆21Oct 28, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 去飞书水印☆11Jun 12, 2023Updated 2 years ago
- ☆28Jul 18, 2025Updated 9 months ago
- official code for "3D Question Answering via only 2D Vision-Language Models"☆23Mar 4, 2026Updated last month
- Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)☆19Nov 28, 2022Updated 3 years ago
- Python 3 support for the MS COCO caption evaluation tools☆14Jun 14, 2024Updated last year
- 此代码用 于RoboMaster AI Challenge 2020的平面仿真☆10May 10, 2020Updated 5 years ago
- Improved IPC for Electron☆12Nov 6, 2017Updated 8 years ago