showlab / VideoLISALinks

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

☆138

Alternatives and similar repositories for VideoLISA

Users that are interested in VideoLISA are comparing it to the libraries listed below

Sorting:

mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆86Updated 6 months ago
cilinyan / VISA
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆192Updated last year
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆79Updated last year
Rubics-Xuan / MRES
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…
☆72Updated last year
WHB139426 / Grounded-Video-LLM
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆130Updated 2 months ago
Visual-AI / PruneVid
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆55Updated 5 months ago
yongliang-wu / NumPro
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆123Updated 2 weeks ago
zhang9302002 / ThinkingWithVideos
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆48Updated last week
buxiangzhiren / VD-IT
Code for the paper "Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation", ECCV 2024
☆43Updated last year
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆124Updated 2 months ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆121Updated 6 months ago
ziplab / LongVLM
☆104Updated last year
Becomebright / ReKV
Official PyTorch Code of ReKV (ICLR'25)
☆61Updated 7 months ago
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆55Updated 4 months ago
Hon-Wong / Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆86Updated 11 months ago
congvvc / InstructSeg
[ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"
☆48Updated 8 months ago
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆52Updated 6 months ago
haochenheheda / LVVIS
Large-Vocabulary Video Instance Segmentation dataset
☆95Updated last year
wdrink / OpenTokenizer
☆21Updated 9 months ago
lizhou-cs / mglmm
☆32Updated last year
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆103Updated 4 months ago
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆215Updated this week
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆129Updated 4 months ago
TimeMarker-LLM / TimeMarker
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆99Updated 10 months ago
SuleBai / SC-CLIP
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
☆56Updated 5 months ago
hmxiong / StreamChat
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆78Updated 7 months ago
eric-ai-lab / GRIT
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆152Updated last week
ProvenceStar / PartGLEE
[ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
☆52Updated last year
hanghuacs / FineCaption
☆37Updated 4 months ago
rkzheng99 / ViLLa
Video Reasoning Segmentation
☆25Updated 10 months ago