aimh-lab / visioneLinks

An AI-powered interactive video retrieval system

☆40

Alternatives and similar repositories for visione

Users that are interested in visione are comparing it to the libraries listed below

Sorting:

SCZwangxiao / video-FlexReduc
Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
☆77Updated 3 months ago
NVlabs / LITA
☆180Updated 9 months ago
Hon-Wong / VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
☆322Updated last month
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆393Updated 3 months ago
mbzuai-oryx / VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
☆280Updated 3 weeks ago
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆305Updated 2 months ago
aimagelab / LLaVA-MORE
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
☆145Updated last week
line / lighthouse
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports…
☆169Updated 2 months ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆83Updated this week
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆257Updated this week
bytedance / vidi
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆126Updated last month
bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆148Updated 6 months ago
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆152Updated last year
Leon1207 / Video-RAG-master
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
☆226Updated 3 weeks ago
apple / ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
☆235Updated 10 months ago
huggingface / fineVideo
☆76Updated 10 months ago
tulip-berkeley / open_clip
An open source implementation of CLIP (With TULIP Support)
☆162Updated 2 months ago
antoyang / VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
☆193Updated last year
jyrao / SoccerAgent
[ACM Multimedia 2025] "Multi-Agent System for Comprehensive Soccer Understanding"
☆31Updated last month
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆209Updated 7 months ago
gls0425 / LinVT
LinVT: Empower Your Image-level Large Language Model to Understand Videos
☆82Updated 7 months ago
OpenGVLab / VideoChat-Flash
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆451Updated last month
sudo-Boris / mr-Blip
Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"
☆89Updated 5 months ago
hyc2026 / StoryTeller
☆75Updated 5 months ago
AIVIETNAMResearch / AI-City-2024-Track2
AICITY2024 Track 2 - Code from AIO_ISC Team
☆35Updated last year
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆107Updated 6 months ago
yeliudev / VideoMind
💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
☆239Updated last month
andimarafioti / florence2-finetuning
Quick exploration into fine tuning florence 2
☆326Updated 10 months ago
EasonXiao-888 / UVCOM
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
☆100Updated last year
MILVLG / imp
a family of highly capabale yet efficient large multimodal models
☆186Updated 11 months ago