Yanqing0327 / MLLMs-AugmentedLinks

The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》

☆31

Alternatives and similar repositories for MLLMs-Augmented

Users that are interested in MLLMs-Augmented are comparing it to the libraries listed below

Sorting:

whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆65Updated last year
LaVi-Lab / AIM
[ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"
☆45Updated 2 months ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 7 months ago
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆68Updated last year
linzhiqiu / visual_gpt_score
VisualGPTScore for visio-linguistic reasoning
☆27Updated 2 years ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆111Updated last year
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆48Updated 11 months ago
wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆46Updated 10 months ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆50Updated last year
dhg-wei / TOPA
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆30Updated last year
ChocoWu / SeTok
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆75Updated 7 months ago
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆80Updated last year
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆55Updated 8 months ago
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆70Updated 9 months ago
vinid / neg_clip
NegCLIP.
☆38Updated 2 years ago
Share14 / ShareGemini
☆32Updated last year
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆109Updated 6 months ago
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆51Updated 9 months ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
takomc / amp
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆20Updated last year
Visual-AI / PruneVid
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆60Updated 6 months ago
Yaxin9Luo / Gamma-MOD
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆40Updated last month
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated last month
AoiDragon / POPE
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆99Updated 3 months ago
DCDmllm / Momentor
☆80Updated last year
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆24Updated 8 months ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆74Updated 4 months ago
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆58Updated last year