[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
☆282Jan 16, 2025Updated last year
Alternatives and similar repositories for Inf-CLIP
Users that are interested in Inf-CLIP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆52Jul 11, 2025Updated 8 months ago
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆26Dec 30, 2024Updated last year
- [CVPR 2023] Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning☆22Jun 11, 2023Updated 2 years ago
- Fuzzy Positive Learning (CVPR2023)☆15Jul 25, 2024Updated last year
- [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆126Dec 28, 2024Updated last year
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆196Mar 17, 2025Updated last year
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆43Feb 27, 2025Updated last year
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint☆430Mar 26, 2024Updated last year
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆1,285Jan 23, 2025Updated last year
- [NeurIPS 2023] Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs☆129Nov 15, 2023Updated 2 years ago
- Frontier Multimodal Foundation Models for Image and Video Understanding☆1,128Aug 14, 2025Updated 7 months ago
- Precision Search through Multi-Style Inputs☆74Jul 30, 2025Updated 7 months ago
- 【Nature Computational Science 2025🔥】Deep peak property learning for efficient chiral molecules ECD spectra prediction☆51Jan 12, 2025Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆643Feb 1, 2026Updated last month
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆53Apr 9, 2024Updated last year
- The official repo for the DanQing dataset.☆32Jan 16, 2026Updated 2 months ago
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆278May 26, 2025Updated 9 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆73Feb 9, 2026Updated last month
- GPT-4V(ision) as A Social Media Analysis Engine☆38Dec 20, 2024Updated last year
- [cvpr2023] implementation of out-of-candidate rectification methods☆15Feb 28, 2023Updated 3 years ago
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆47Oct 9, 2025Updated 5 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆79Oct 31, 2024Updated last year
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆235Jan 22, 2026Updated 2 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆449Aug 8, 2025Updated 7 months ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,826Nov 27, 2025Updated 3 months ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆38Dec 5, 2024Updated last year
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Apr 14, 2025Updated 11 months ago
- [ICLR'25] PiCO: Peer Review in LLMs based on the Consistency Optimization, https://arxiv.org/pdf/2402.01830☆36Feb 16, 2025Updated last year
- [AAAI26] Next Patch Prediction☆132Jan 2, 2025Updated last year
- Unified Multi-modal IAA Baseline and Benchmark☆94Sep 27, 2024Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 10 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizers☆998Nov 25, 2025Updated 3 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆136Apr 9, 2025Updated 11 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆33Mar 26, 2025Updated 11 months ago
- The implementation of VectorNet. Done and Lose☆41Jun 21, 2020Updated 5 years ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆388Oct 7, 2024Updated last year
- [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations☆147Apr 9, 2024Updated last year
- GPT as a Monte Carlo Language Tree: A Probabilistic Perspective☆45Jan 18, 2025Updated last year