DAMO-NLP-SG / Inf-CLIP
The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
☆164Updated last week
Related projects ⓘ
Alternatives and complementary repositories for Inf-CLIP
- Official implementation of the Law of Vision Representation in MLLMs☆128Updated 2 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆178Updated last month
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models☆227Updated last month
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated this week
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆194Updated 8 months ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆213Updated 2 months ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆271Updated 3 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning"☆179Updated last week
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆98Updated 5 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆135Updated 3 weeks ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆131Updated last month
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆294Updated 3 months ago
- Long Context Transfer from Language to Vision☆328Updated 2 weeks ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆244Updated 4 months ago
- Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆255Updated 2 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆205Updated this week
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆93Updated 3 months ago
- When do we not need larger vision models?☆333Updated 2 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆62Updated this week
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated last week
- E5-V: Universal Embeddings with Multimodal Large Language Models☆168Updated 3 months ago
- ☆103Updated 3 months ago
- Matryoshka Multimodal Models☆81Updated last month
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆122Updated 2 weeks ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆197Updated 7 months ago
- ☆126Updated last week
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆120Updated 4 months ago
- ☆131Updated 10 months ago
- [ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"☆132Updated 2 weeks ago