princetonvisualai/merv

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/princetonvisualai/merv)

princetonvisualai / merv

Unifying Specialized Visual Encoders for Video Language Models

☆25

Alternatives and similar repositories for merv

Users that are interested in merv are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mahtabbigverdi / Aurora
View on GitHub
☆12Dec 4, 2024Updated last year
jiazheng-xing / SloshNet
View on GitHub
[AAAI2023] Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition (SloshNet)
☆14Jan 10, 2024Updated 2 years ago
soCzech / MultiTaskObjectStates
View on GitHub
Code for the paper "Multi-Task Learning of Object States and State-Modifying Actions from Web Videos" published in TPAMI
☆11Mar 3, 2024Updated 2 years ago
Yui010206 / MEXA
View on GitHub
[EMNLP 2025 Findings] MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
☆15Aug 22, 2025Updated 11 months ago
vgbench / VGBench
View on GitHub
☆19Sep 19, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
HJYao00 / MMReason
View on GitHub
[ICCV 2025] MMReason, MLLMs, step by step, reasoning benchmark, AGI
☆15Apr 25, 2026Updated 3 months ago
jh-yi / Video-Panda
View on GitHub
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]
☆81Jun 24, 2025Updated last year
Hypnosx / Kinetics-TPS
View on GitHub
ICCV DeeperAction Challenge - Kinetics-TPS Challenge on Part-level Action Parsing and Action Recognition.
☆14Jun 4, 2021Updated 5 years ago
UCSB-AI / MMWorld
View on GitHub
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Jul 15, 2025Updated last year
evelinehong / 3D-Concept-Grounding
View on GitHub
Code Release of "3D Concept Grounding on Neural Fields (NeurIPS2022)"
☆15Feb 13, 2023Updated 3 years ago
Hritikbansal / videocon
View on GitHub
☆58Apr 24, 2024Updated 2 years ago
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year
TuringEyeTest / TuringEyeTest
View on GitHub
Pixels, Patterns, but no Poetry: To See the World like Humans
☆18Aug 11, 2025Updated 11 months ago
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Accio-Lab / SwimBird
View on GitHub
☆18Apr 9, 2026Updated 3 months ago
rish-16 / dalle2-pytorch
View on GitHub
Unofficial PyTorch implementation of DALL-E 2 by OpenAI
☆10Apr 6, 2022Updated 4 years ago
SHI-Labs / VisPer-LM
View on GitHub
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
☆73Oct 17, 2025Updated 9 months ago
Svardfox / LaViT
View on GitHub
Official codebase for the paper LaViT
☆34Feb 15, 2026Updated 5 months ago
SalesforceAIResearch / strefer
View on GitHub
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
☆19Jun 2, 2026Updated last month
d-ailin / CLIP-Guided-Decoding
View on GitHub
☆18Aug 1, 2024Updated last year
Hoar012 / TDC-Video
View on GitHub
Official implementation of TDC.
☆15Jul 22, 2025Updated last year
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Feb 27, 2025Updated last year
chengyou-jia / T2IS
View on GitHub
Official Repo for "Why Settle for One? Text-to-ImageSet Generation and Evaluation"
☆21Oct 1, 2025Updated 9 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ybb6 / laser
View on GitHub
☆34Apr 22, 2026Updated 3 months ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆16Feb 3, 2026Updated 5 months ago
Jieqianyu / PANet
View on GitHub
[IROS 2023] PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation
☆25Jun 28, 2023Updated 3 years ago
X-GenGroup / PaCo-RL
View on GitHub
Official Implementation for *PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling*
☆42Dec 13, 2025Updated 7 months ago
xcltql666 / DenseDiT
View on GitHub
Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"
☆27Jun 7, 2026Updated last month
alibaba-damo-academy / VL-Cogito
View on GitHub
☆24Nov 4, 2025Updated 8 months ago
AndongDeng / BEAR
View on GitHub
BEAR: a new BEnchmark on video Action Recognition
☆46Apr 21, 2024Updated 2 years ago
coolbay / Re2TAL
View on GitHub
Repository for the CVPR23 paper Re^2TAL
☆13Nov 21, 2025Updated 8 months ago
Xujxyang / OpenTrans
View on GitHub
☆26Apr 17, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DAMO-NLP-SG / CMM
View on GitHub
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆54Jul 11, 2025Updated last year
farewellthree / PPLLaVA
View on GitHub
Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"
☆133Nov 19, 2024Updated last year
ArmelRandy / tree-of-problems
View on GitHub
[EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality
☆20Mar 4, 2025Updated last year
Tencent-QQMM / Video-CCAM
View on GitHub
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
☆74Oct 14, 2024Updated last year
TimeMarker-LLM / TimeMarker
View on GitHub
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆107Nov 28, 2024Updated last year
edorado93 / HMM-Part-of-Speech-Tagger
View on GitHub
An HMM based Part of Speech Tagger
☆10May 30, 2018Updated 8 years ago
Jieqianyu / SSC-RS
View on GitHub
[IROS 2023] SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion
☆35Jun 28, 2023Updated 3 years ago