Mid-Push / SmartCLIPLinks
SmartCLIP: A training method to improve CLIP with both short and long texts
☆19Updated 2 months ago
Alternatives and similar repositories for SmartCLIP
Users that are interested in SmartCLIP are comparing it to the libraries listed below
Sorting:
- Official implementation of Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models (ICLR 2024 Spotlight)☆13Updated 8 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆38Updated 9 months ago
- [CVPR 2025] Official PyTorch Code for "MMRL: Multi-Modal Representation Learning for Vision-Language Models" and its extension "MMRL++: P…☆69Updated 2 months ago
- ☆41Updated 2 months ago
- [CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression☆49Updated 6 months ago
- ☆12Updated 7 months ago
- [CVPR 2024] Code for HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation☆73Updated 10 months ago
- Official code repository of Shuffle-R1☆24Updated last week
- ☆12Updated 9 months ago
- ☆22Updated 6 months ago
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆17Updated 4 months ago
- 🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆31Updated last month
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆88Updated 3 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆83Updated last year
- [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs☆63Updated 3 weeks ago
- The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"☆16Updated last year
- [AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints☆34Updated 2 months ago
- ☆12Updated 7 months ago
- [ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection☆17Updated 4 months ago
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆22Updated 6 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Updated last year
- Code for Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models☆26Updated 10 months ago
- ☆21Updated 10 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆70Updated 3 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆101Updated 3 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆135Updated last year
- [CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of t…☆43Updated 6 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆28Updated 4 months ago
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆22Updated 2 months ago
- ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2☆66Updated 9 months ago