[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆72Feb 11, 2025Updated last year
Alternatives and similar repositories for LCL
Users that are interested in LCL are comparing it to the libraries listed below
Sorting:
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆368Jul 24, 2025Updated 7 months ago
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆15Jul 15, 2025Updated 7 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆412May 5, 2025Updated 9 months ago
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆21Oct 8, 2024Updated last year
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆45Apr 3, 2025Updated 10 months ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆33Sep 28, 2025Updated 4 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- A collection of visual instruction tuning datasets.☆76Mar 14, 2024Updated last year
- Unifying Specialized Visual Encoders for Video Language Models☆25Nov 22, 2025Updated 3 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- ☆19Dec 6, 2023Updated 2 years ago
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆75May 23, 2025Updated 9 months ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆159Sep 27, 2024Updated last year
- E5-V: Universal Embeddings with Multimodal Large Language Models☆274Dec 10, 2025Updated 2 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆99Jun 23, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- ChineseCLIP using online learning☆13Nov 7, 2022Updated 3 years ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆64Dec 8, 2025Updated 2 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆505Aug 9, 2024Updated last year
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆28Aug 15, 2025Updated 6 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆299Jan 23, 2025Updated last year
- ☆10Apr 3, 2023Updated 2 years ago
- ☆11Apr 21, 2025Updated 10 months ago
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- ☆124Jul 29, 2024Updated last year
- ☆53Jul 23, 2024Updated last year
- T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)☆42Oct 6, 2025Updated 4 months ago
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆331Jul 4, 2025Updated 7 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆123Nov 25, 2024Updated last year
- FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens☆17Sep 8, 2025Updated 5 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- ☆21Jul 21, 2025Updated 7 months ago
- The official repo for the DanQing dataset.☆29Jan 16, 2026Updated last month
- [ACL 2025 🔥] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts☆18May 22, 2025Updated 9 months ago
- Code and data for UniEgoMotion (ICCV 2025)☆44Nov 11, 2025Updated 3 months ago