[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
☆279Jan 16, 2025Updated last year
Alternatives and similar repositories for Inf-CLIP
Users that are interested in Inf-CLIP are comparing it to the libraries listed below
Sorting:
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆52Jul 11, 2025Updated 7 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- [CVPR 2023] Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning☆22Jun 11, 2023Updated 2 years ago
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆194Mar 17, 2025Updated 11 months ago
- Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint☆427Mar 26, 2024Updated last year
- Fuzzy Positive Learning (CVPR2023)☆15Jul 25, 2024Updated last year
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆25Dec 30, 2024Updated last year
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆276May 26, 2025Updated 9 months ago
- Precision Search through Multi-Style Inputs☆73Jul 30, 2025Updated 7 months ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,815Nov 27, 2025Updated 3 months ago
- [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆126Dec 28, 2024Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆630Feb 1, 2026Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆234Jan 22, 2026Updated last month
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆43Feb 27, 2025Updated last year
- Frontier Multimodal Foundation Models for Image and Video Understanding☆1,109Aug 14, 2025Updated 6 months ago
- The official repo for the DanQing dataset.☆30Jan 16, 2026Updated last month
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆368Jul 24, 2025Updated 7 months ago
- 【Nature Computational Science 2025🔥】Deep peak property learning for efficient chiral molecules ECD spectra prediction☆51Jan 12, 2025Updated last year
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆438Aug 8, 2025Updated 6 months ago
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆45Oct 9, 2025Updated 4 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆226Mar 20, 2025Updated 11 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆319Jun 3, 2024Updated last year
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆1,278Jan 23, 2025Updated last year
- Unified Multi-modal IAA Baseline and Benchmark☆93Sep 27, 2024Updated last year
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆79Oct 31, 2024Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 9 months ago
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆893Aug 13, 2024Updated last year
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆36Apr 14, 2025Updated 10 months ago
- Next-Token Prediction is All You Need☆2,355Jan 12, 2026Updated last month
- SEED-Voken: A Series of Powerful Visual Tokenizers☆996Nov 25, 2025Updated 3 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- GPT-4V(ision) as A Social Media Analysis Engine☆38Dec 20, 2024Updated last year
- [ICLR'25] Reconstructive Visual Instruction Tuning☆135Apr 9, 2025Updated 10 months ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆38Dec 5, 2024Updated last year
- [AAAI26] Next Patch Prediction☆132Jan 2, 2025Updated last year
- When do we not need larger vision models?☆413Feb 8, 2025Updated last year
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆289Jan 14, 2024Updated 2 years ago
- [ICLR'25] PiCO: Peer Review in LLMs based on the Consistency Optimization, https://arxiv.org/pdf/2402.01830☆36Feb 16, 2025Updated last year