PKU-YuanGroup / LLMBindLinks

LLMBind: A Unified Modality-Task Integration Framework

☆17

Alternatives and similar repositories for LLMBind

Users that are interested in LLMBind are comparing it to the libraries listed below

Sorting:

haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆19Updated 3 months ago
showlab / MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆39Updated 4 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆59Updated 5 months ago
PKU-YuanGroup / Look-Back
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
☆29Updated last week
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆60Updated this week
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆60Updated last year
TencentARC / TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆101Updated last month
ruili33 / TPO
☆34Updated 5 months ago
MME-Benchmarks / MME-Unify
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆41Updated 3 months ago
Visual-AI / PruneVid
The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".
☆49Updated 2 months ago
AV-Odyssey / AV-Odyssey
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆26Updated 6 months ago
ncTimTang / AKS
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆80Updated 2 months ago
Yaxin9Luo / Gamma-MOD
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆37Updated 5 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆94Updated last month
ChocoWu / SeTok
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆67Updated 2 months ago
qiujihao19 / Artemis
☆26Updated 3 months ago
PhoenixZ810 / RISEBench
Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
☆69Updated last week
hanghuacs / FineCaption
☆34Updated 3 weeks ago
Tencent / HaploVLM
ICML2025
☆51Updated last month
zhangquanchen / VisRL
[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
☆33Updated last month
SCZwangxiao / video-ReTaKe
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆34Updated 4 months ago
xjtupanda / Sparrow
Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"
☆49Updated 4 months ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆37Updated 5 months ago
VIStA-H / GPT-4V_Social_Media
GPT-4V(ision) as A Social Media Analysis Engine
☆37Updated 6 months ago
mengchuang123 / VASparse-github
[CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
☆34Updated 3 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆113Updated last month
Becomebright / ReKV
Official PyTorch Code of ReKV (ICLR'25)
☆35Updated 4 months ago
WHB139426 / Grounded-Video-LLM
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆117Updated 3 months ago
SilentView / LVD-2M
[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
☆63Updated 9 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆81Updated 10 months ago