[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆72Feb 11, 2025Updated last year
Alternatives and similar repositories for LCL
Users that are interested in LCL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Jun 7, 2024Updated last year
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆368Jul 24, 2025Updated 8 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆415May 5, 2025Updated 11 months ago
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆16Jul 15, 2025Updated 8 months ago
- A collection of visual instruction tuning datasets.☆77Mar 14, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- ☆19Dec 6, 2023Updated 2 years ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆46Apr 3, 2025Updated last year
- ☆31Jun 29, 2022Updated 3 years ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆22Oct 8, 2024Updated last year
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆70Dec 8, 2025Updated 4 months ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆29Aug 15, 2025Updated 7 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆274Dec 10, 2025Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆31Oct 9, 2025Updated 6 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆506Aug 9, 2024Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆301Jan 23, 2025Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆895Aug 13, 2024Updated last year
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆159Sep 27, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 8 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆53Jun 12, 2025Updated 9 months ago
- Official repository for the paper PLLaVA☆675Jul 28, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆77May 23, 2025Updated 10 months ago
- [CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.☆91Jun 1, 2023Updated 2 years ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆34Sep 28, 2025Updated 6 months ago
- MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"☆21Jul 15, 2024Updated last year
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆330Jul 4, 2025Updated 9 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Apr 3, 2024Updated 2 years ago
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆99Jun 23, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆460Dec 2, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Fully Open Framework for Democratized Multimodal Reinforcement Learning.☆44Dec 19, 2025Updated 3 months ago
- FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens☆17Sep 8, 2025Updated 7 months ago
- ☆38Jul 9, 2024Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Oct 6, 2025Updated 6 months ago
- A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).☆19Nov 13, 2024Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 11 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆124Nov 25, 2024Updated last year