apple / ml-mobileclipLinks

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

☆988

Alternatives and similar repositories for ml-mobileclip

Users that are interested in ml-mobileclip are comparing it to the libraries listed below

Sorting:

apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,323Updated 2 months ago
apple / ml-4m
4M: Massively Multimodal Masked Modeling
☆1,748Updated last month
facebookresearch / MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expert…
☆1,474Updated this week
IDEA-Research / Grounding-DINO-1.5-API
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
☆983Updated 5 months ago
facebookresearch / perception_models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆1,402Updated last month
Meituan-AutoML / MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
☆1,239Updated last year
NVlabs / RADIO
Official repository for "AM-RADIO: Reduce All Domains Into One"
☆1,241Updated last week
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆531Updated 2 weeks ago
apple / ml-fastvit
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural R…
☆1,939Updated last year
facebookresearch / hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
☆1,003Updated last year
chongzhou96 / EdgeSAM
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
☆1,038Updated last month
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆893Updated last month
merveenoyan / siglip
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
☆256Updated 4 months ago
yformer / EfficientTAM
Efficient Track Anything
☆586Updated 6 months ago
allenai / molmo
Code for the Molmo Vision-Language Model
☆557Updated 7 months ago
OpenGVLab / VisionLLM
VisionLLM Series
☆1,084Updated 4 months ago
siyuanliii / masa
Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything
☆1,317Updated 2 months ago
NVlabs / FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
☆860Updated 3 months ago
andimarafioti / florence2-finetuning
Quick exploration into fine tuning florence 2
☆323Updated 9 months ago
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆244Updated 5 months ago
THU-MIG / yoloe
YOLOE: Real-Time Seeing Anything [ICCV 2025]
☆1,420Updated 3 weeks ago
xinghaochen / TinySAM
[AAAI 2025] Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"
☆490Updated 5 months ago
UCSC-VLAA / OpenVision
[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
☆283Updated 2 months ago
czg1225 / SlimSAM
[NeurIPS 2024] SlimSAM: 0.1% Data Makes Segment Anything Slim
☆336Updated 4 months ago
IDEA-Research / DINO-X-API
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
☆1,117Updated 3 weeks ago
huggingface / sam2-studio
☆359Updated 9 months ago
robustsam / RobustSAM
RobustSAM: Segment Anything Robustly on Degraded Images (CVPR 2024 Highlight)
☆355Updated 10 months ago
baaivision / tokenize-anything
[ECCV 2024] Tokenize Anything via Prompting
☆585Updated 7 months ago
LLaVA-VL / LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
☆374Updated 11 months ago
NVlabs / describe-anything
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
☆1,241Updated 3 weeks ago