OpenGVLab/LCL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/LCL)

OpenGVLab / LCL

[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

☆72

Alternatives and similar repositories for LCL

Users that are interested in LCL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / De-focus-Attention-Networks
View on GitHub
Learning 1D Causal Visual Representation with De-focus Attention Networks
☆35Jun 7, 2024Updated 2 years ago
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated 11 months ago
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
BAAI-DCAI / DataOptim
View on GitHub
A collection of visual instruction tuning datasets.
☆77Mar 14, 2024Updated 2 years ago
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
jiaangli / VILA
View on GitHub
[TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Nov 22, 2024Updated last year
RammusLeo / ScoreHOI
View on GitHub
Official repository of ScoreHOI (ICCV 2025)
☆16Dec 21, 2025Updated 7 months ago
GaryGuTC / UniME-v2
View on GitHub
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆74Dec 8, 2025Updated 7 months ago
fundamentalvision / UniGrad
View on GitHub
☆31Jun 29, 2022Updated 4 years ago
ytaek-oh / fsc-clip
View on GitHub
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆22Oct 8, 2024Updated last year
deepglint / Victor
View on GitHub
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
☆29Aug 15, 2025Updated 11 months ago
xiaoxing2001 / DeGLA
View on GitHub
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆16Jul 15, 2025Updated last year
kongds / E5-V
View on GitHub
E5-V: Universal Embeddings with Multimodal Large Language Models
☆275Dec 10, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
jiyt17 / IDA-VLM
View on GitHub
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Nov 27, 2024Updated last year
beichenzbc / Long-CLIP
View on GitHub
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
☆900Aug 13, 2024Updated last year
haoxiangzhao12138 / PLUME
View on GitHub
[ACMMM 2026] PLUME: Latent Reasoning Based Universal Multimodal Embedding
☆24Apr 29, 2026Updated 2 months ago
baaivision / DIVA
View on GitHub
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
☆301Jan 23, 2025Updated last year
AdamRain / YFCC15M_downloader
View on GitHub
A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).
☆19Nov 13, 2024Updated last year
OpenGVLab / PVC
View on GitHub
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆54Jun 12, 2025Updated last year
PhoenixZ810 / MG-LLaVA
View on GitHub
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
☆160Sep 27, 2024Updated last year
MCG-NJU / VideoEval
View on GitHub
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model
☆15Jul 31, 2025Updated 11 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
OpenGVLab / M3I-Pretraining
View on GitHub
[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
☆91Jun 1, 2023Updated 3 years ago
XMUDeepLIT / LLaVE
View on GitHub
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆78May 23, 2025Updated last year
jin-s13 / MMPD-Dataset
View on GitHub
MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"
☆21Jul 15, 2024Updated 2 years ago
V3Det / mmdetection-V3Det
View on GitHub
OpenMMLab Detection Toolbox and Benchmark for V3Det
☆15Apr 3, 2024Updated 2 years ago
Oryx-mllm / Oryx
View on GitHub
[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
☆329Jul 4, 2025Updated last year
ByungKwanLee / TroL
View on GitHub
[EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…
☆99Jun 23, 2024Updated 2 years ago
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆462Dec 2, 2024Updated last year
mcleish7 / gemstone-scaling-laws
View on GitHub
Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)
☆35Sep 28, 2025Updated 9 months ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
View on GitHub
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆176Oct 6, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated last year
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year
zhourax / VEGA
View on GitHub
☆38Jul 9, 2024Updated 2 years ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
OpenGVLab / MM-NIAH
View on GitHub
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆126Nov 25, 2024Updated last year
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
ziqipang / LM4VisualEncoding
View on GitHub
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
☆244Jun 29, 2026Updated 3 weeks ago