gls0425/LinVT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gls0425/LinVT)

gls0425 / LinVT

LinVT: Empower Your Image-level Large Language Model to Understand Videos

☆83

Alternatives and similar repositories for LinVT

Users that are interested in LinVT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

iLearn-Lab / ACL25-AdaReTaKe
View on GitHub
Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
☆91Apr 21, 2026Updated 3 months ago
OpenGVLab / VideoChat-Flash
View on GitHub
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆526Updated this week
CeeZh / SILVR
View on GitHub
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Jan 18, 2026Updated 6 months ago
fansunqi / AKeyS
View on GitHub
Agentic Keyframe Search for Video Question Answering
☆18Jun 30, 2026Updated 3 weeks ago
TimeMarker-LLM / TimeMarker
View on GitHub
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
☆107Nov 28, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
steven-ccq / ViLAMP
View on GitHub
[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"
☆194Sep 23, 2025Updated 9 months ago
farewellthree / PPLLaVA
View on GitHub
Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"
☆133Nov 19, 2024Updated last year
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆285Oct 15, 2025Updated 9 months ago
bytedance / tarsier
View on GitHub
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆547Aug 14, 2025Updated 11 months ago
DCDmllm / Momentor
View on GitHub
☆81Nov 24, 2024Updated last year
rohit901 / VANE-Bench
View on GitHub
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
☆24Aug 19, 2025Updated 11 months ago
OpenGVLab / PVC
View on GitHub
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆54Jun 12, 2025Updated last year
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
apple / ml-slowfast-llava
View on GitHub
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
☆291Sep 16, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated 11 months ago
egoschema / EgoSchema
View on GitHub
☆117Dec 30, 2024Updated last year
ruili33 / TPO
View on GitHub
☆41Sep 9, 2025Updated 10 months ago
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
yunzhuzhang0918 / flexselect
View on GitHub
The official repository for paper "FlexSelect: Flexible Token Selection for Efficient Long Video Understanding".
☆31Sep 19, 2025Updated 10 months ago
ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Jun 18, 2026Updated last month
jongwoopark7978 / LVNet
View on GitHub
[Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.
☆44Feb 10, 2026Updated 5 months ago
Vision-CAIR / LongVU
View on GitHub
[ICML 2025] Official PyTorch implementation of LongVU
☆429May 8, 2025Updated last year
DAMO-NLP-SG / VideoLLaMA3
View on GitHub
Frontier Multimodal Foundation Models for Image and Video Understanding
☆1,172Aug 14, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MAC-AutoML / WFS-SB
View on GitHub
[CVPR 2026] Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
☆31Apr 12, 2026Updated 3 months ago
thu-nics / FrameFusion
View on GitHub
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
☆76Jan 13, 2026Updated 6 months ago
appletea233 / Temporal-R1
View on GitHub
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆62Jun 6, 2025Updated last year
DocTron-hub / FD-RL
View on GitHub
[CVPR 2026] Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR
☆18Mar 23, 2026Updated 3 months ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
josephzpng / DisTime
View on GitHub
DisTime: Distribution-based Time Representation for Video Large Language Models.
☆21Jul 10, 2025Updated last year
yunlong10 / CAT-V
View on GitHub
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…
☆67Jan 27, 2026Updated 5 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
tingyu215 / TS-LLaVA
View on GitHub
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
☆17Jan 2, 2025Updated last year
InternRobotics / EgoThinker
View on GitHub
Official implementation of EgoThinker at NIPS 2025
☆29Nov 25, 2025Updated 7 months ago
Tencent-QQMM / Video-CCAM
View on GitHub
A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.
☆73Oct 14, 2024Updated last year
sming256 / BOLT
View on GitHub
[CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
☆55Feb 5, 2026Updated 5 months ago
Ziyang412 / VideoTree
View on GitHub
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆165Jun 23, 2025Updated last year
alibaba-damo-academy / PixelRefer
View on GitHub
The code for PixelRefer & VideoRefer
☆352Nov 16, 2025Updated 8 months ago
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year