apple / ml-mobileclip-drLinks
RayGen: Multi-Modal Dataset Reinforcement for MobileCLIP and MobileCLIP2
☆37Updated 5 months ago
Alternatives and similar repositories for ml-mobileclip-dr
Users that are interested in ml-mobileclip-dr are comparing it to the libraries listed below
Sorting:
- SmolVLM2 Demo☆186Updated 10 months ago
- VimTS: A Unified Video and Image Text Spotter☆78Updated last year
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆17Updated 5 months ago
- [ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs☆97Updated 2 weeks ago
- Utility to test the performance of CoreML models.☆70Updated 5 years ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆23Updated last year
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆32Updated last year
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated last year
- Export utility for unconstrained channel pruned models☆71Updated 2 years ago
- ☆91Updated 2 years ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆37Updated 2 years ago
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆52Updated last year
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆90Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Updated last year
- ☆69Updated last year
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆125Updated last year
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆34Updated 5 months ago
- EdgeSAM model for use with Autodistill.☆29Updated last year
- Research publication code for "Forward Compatible Training for Large-Scale Embedding Retrieval Systems", CVPR 2022, and "FastFill: Effici…☆56Updated 2 years ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆40Updated last year
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆30Updated 2 years ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆33Updated 3 years ago
- 🎮Manipulates mobile phones just like how you would. Official code for "MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficien…☆27Updated 4 months ago
- ☆59Updated last year
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆154Updated last month
- ☆56Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆48Updated last year
- ☆17Updated 6 months ago
- mllm-npu: training multimodal large language models on Ascend NPUs☆95Updated last year