yongliang-wu/MM-VID

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yongliang-wu/MM-VID)

yongliang-wu / MM-VID

Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".

☆44

Alternatives and similar repositories for MM-VID

Users that are interested in MM-VID are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yongliang-wu / Repurpose
View on GitHub
[AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark
☆31Apr 4, 2026Updated 3 months ago
yongliang-wu / NumPro
View on GitHub
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆150Jan 19, 2026Updated 6 months ago
wenyu1009 / RTSRN
View on GitHub
☆20Sep 19, 2023Updated 2 years ago
FeipengMa6 / VLoRA
View on GitHub
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆56Mar 31, 2025Updated last year
OpenEnvision / AutoRubric-as-Reward
View on GitHub
Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria
☆50Jul 2, 2026Updated 2 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Kamichanw / ICLTestbed
View on GitHub
An in-context learning research testbed
☆19Mar 16, 2025Updated last year
ForJadeForest / ImageSearchLightningCLIP
View on GitHub
Using distilled CLIP model to deploy the android device
☆20Feb 28, 2023Updated 3 years ago
dhg-wei / TOPA
View on GitHub
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆29Sep 27, 2024Updated last year
yongliang-wu / DFT
View on GitHub
[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
☆587Jan 4, 2026Updated 6 months ago
JoyHuYY1412 / S4Former
View on GitHub
Training Vision Transformers for Semi-Supervised Semantic Segmentation
☆16Nov 3, 2025Updated 8 months ago
injadlu / VCR
View on GitHub
☆13Feb 25, 2025Updated last year
Ouxiang-Li / SPEED
View on GitHub
[ICLR'26] SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
☆40Mar 9, 2026Updated 4 months ago
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)
☆23Jun 24, 2026Updated 3 weeks ago
eshoyuan / TrackGPT
View on GitHub
TrackGPT: Track What You Need in Videos via Text Prompts
☆25May 16, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
UMR-R / QMem
View on GitHub
☆46May 16, 2026Updated 2 months ago
Vanixxz / BackMix
View on GitHub
[TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors
☆16Apr 23, 2025Updated last year
mercurystraw / Kris_Bench
View on GitHub
[NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"
☆45Oct 19, 2025Updated 9 months ago
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
YujieLu10 / Seeker
View on GitHub
☆11May 24, 2024Updated 2 years ago
HauffQian / DGAP
View on GitHub
☆14May 13, 2025Updated last year
SooLab / CoTDet
View on GitHub
[ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
☆19Apr 23, 2025Updated last year
scofield7419 / Video-of-Thought
View on GitHub
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆182Feb 25, 2025Updated last year
PKU-YuanGroup / LLaVA-o1
View on GitHub
☆57Nov 21, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
xxiqiao / TROJail
View on GitHub
Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"
☆31Updated this week
tonysy / PyAction
View on GitHub
A Toolkit for Video Action Recognition(Classification/Detection)
☆17Mar 23, 2022Updated 4 years ago
jonathan-roberts1 / SciFIBench
View on GitHub
NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
☆13May 24, 2025Updated last year
mair-lab / EARL
View on GitHub
EARL: Editing with Autoregression and RL
☆43Nov 21, 2025Updated 8 months ago
waynelee-lwc / english-orignal-booklist
View on GitHub
一个小小的书单，收集整理了一些计算机科学与技术方面的书籍英文原著pdf。
☆10Jan 13, 2022Updated 4 years ago
ZFancy / Unleashing-Mask
View on GitHub
[ICML 2023] "Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability"
☆18Jul 7, 2023Updated 3 years ago
xlite-dev / qwen-image-fast
View on GitHub
⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs
☆17Oct 24, 2025Updated 8 months ago
lin2025 / gpt4
View on GitHub
LinGPT, a GPT-4 webpage with just a single HTML file. 只有一个html文件的GPT4聊天网页，零门槛，10秒搞定。多Key轮询 Auto Key Rotation 支持代理平台/第三方Key Supports proxy…
☆12Aug 28, 2023Updated 2 years ago
yolky / RCIG
View on GitHub
☆15Apr 25, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
JoakimHaurum / TokenReduction
View on GitHub
Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …
☆35Aug 10, 2023Updated 2 years ago
EthanG97 / ImageDoctor
View on GitHub
The official implementation for "ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning"
☆15Mar 1, 2026Updated 4 months ago
hasura / business-data-benchmark
View on GitHub
Business Data Benchmark (BDB) is a set of real-world questions to evaluate AI systems connected to business data.
☆25Dec 3, 2024Updated last year
Kamichanw / MimIC
View on GitHub
[CVPR'25] Official code of paper "Mimic In-Context Learning for Multimodal Tasks"
☆26May 21, 2026Updated 2 months ago
spacedriveapp / native-deps
View on GitHub
Spacedrive native dependencies
☆13Apr 8, 2025Updated last year
AdaCheng / VidEgoThink
View on GitHub
The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"
☆18Mar 25, 2025Updated last year
sejong-rcv / PVLR
View on GitHub
[ACM MM-24] Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
☆13Oct 8, 2024Updated last year