Share14 / ShareGeminiLinks

☆31

Alternatives and similar repositories for ShareGemini

Users that are interested in ShareGemini are comparing it to the libraries listed below

Sorting:

joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆49Updated 7 months ago
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆61Updated last year
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆64Updated last year
PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆66Updated 9 months ago
patrick-tssn / VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆37Updated 6 months ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆110Updated last year
DCDmllm / Momentor
☆80Updated 11 months ago
showlab / MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆40Updated 7 months ago
mshukor / ima-lmms
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆21Updated last year
RifleZhang / LLaVA-Hound-DPO
☆155Updated 11 months ago
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆124Updated 6 months ago
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆70Updated 8 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated 11 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆66Updated 9 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆129Updated 4 months ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
rxtan2 / Koala-video-llm
☆36Updated last year
RUCAIBox / Event-Bench
Official code of *Towards Event-oriented Long Video Understanding*
☆12Updated last year
showlab / cosmo
☆72Updated last year
PVIT-official / PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Updated 2 years ago
SliMM-X / CoMP-MM
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆32Updated 6 months ago
jiyt17 / IDA-VLM
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆36Updated 10 months ago
VidCapBench / VidCapBench
☆11Updated 5 months ago
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆25Updated 10 months ago
yangjie-cv / WeThink
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆35Updated 4 months ago
TencentARC / FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
☆32Updated 2 years ago
RUCAIBox / ComVint
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Updated last year
si0wang / VisVM
☆45Updated 9 months ago
JoeLeelyf / OVO-Bench
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆93Updated 3 months ago