apple / ml-mobileclip
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
☆792Updated last month
Alternatives and similar repositories for ml-mobileclip:
Users that are interested in ml-mobileclip are comparing it to the libraries listed below
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,143Updated last month
- 4M: Massively Multimodal Masked Modeling☆1,666Updated 3 months ago
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…☆1,337Updated last month
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series☆853Updated 5 months ago
- Strong and Open Vision Language Assistant for Mobile Devices☆1,106Updated 9 months ago
- Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"☆937Updated 5 months ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆892Updated this week
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆238Updated 4 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,217Updated last month
- Efficient Track Anything☆441Updated last week
- streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL☆1,427Updated this week
- Quick exploration into fine tuning florence 2☆289Updated 3 months ago
- Hiera: A fast, powerful, and simple hierarchical vision transformer.☆940Updated 10 months ago
- RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)☆331Updated 4 months ago
- A family of lightweight multimodal models.☆972Updated last month
- Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything☆1,152Updated 2 months ago
- VisionLLM Series☆977Updated 2 weeks ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆1,518Updated 3 weeks ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆442Updated this week
- [NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim☆319Updated 2 months ago
- Famous Vision Language Models and Their Architectures☆565Updated 4 months ago
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]☆685Updated 6 months ago
- LLaVA-Interactive-Demo☆360Updated 5 months ago
- A suite of image and video neural tokenizers☆1,478Updated this week
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆814Updated last month
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆2,752Updated last week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆706Updated this week
- [AAAI 2025] Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"☆424Updated last week
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆791Updated 2 weeks ago