apple / ml-mobileclip-drLinks
RayGen: Multi-Modal Dataset Reinforcement for MobileCLIP and MobileCLIP2
☆33Updated 4 months ago
Alternatives and similar repositories for ml-mobileclip-dr
Users that are interested in ml-mobileclip-dr are comparing it to the libraries listed below
Sorting:
- Codebase for the Recognize Anything Model (RAM)☆88Updated 2 years ago
- SmolVLM2 Demo☆180Updated 9 months ago
- VimTS: A Unified Video and Image Text Spotter☆79Updated last year
- Utility to test the performance of CoreML models.☆70Updated 5 years ago
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆76Updated 3 months ago
- ☆91Updated last year
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆95Updated last month
- ☆16Updated 2 years ago
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆145Updated 2 weeks ago
- a family of highly capabale yet efficient large multimodal models☆191Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆97Updated last year
- Official Repo of Graphist☆129Updated last year
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆17Updated 3 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- Chinese CLIP models with SOTA performance.☆59Updated 2 years ago
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆35Updated 6 months ago
- Export utility for unconstrained channel pruned models☆72Updated 2 years ago
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆29Updated 2 years ago
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆33Updated 4 months ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆131Updated 6 months ago
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆276Updated 3 months ago
- The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"☆248Updated 11 months ago
- ☆29Updated last year
- Zero-label image classification via OpenCLIP knowledge distillation☆139Updated 2 years ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆121Updated last year
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated last year
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆412Updated last month
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆212Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆28Updated 2 years ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆144Updated 11 months ago