apple / ml-fastvlmLinks
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆6,891Updated 6 months ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below
Sorting:
- Run LLMs with MLX☆2,868Updated this week
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,294Updated 3 weeks ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,867Updated this week
- Kimi K2 is the large language model series developed by Moonshot AI team☆9,516Updated 2 weeks ago
- RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tun…☆4,357Updated last week
- Everything about the SmolLM and SmolVLM family of models☆3,408Updated 2 months ago
- Renderer for the harmony response format to be used with gpt-oss☆4,020Updated 2 weeks ago
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,308Updated last month
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,846Updated last month
- Open-source unified multimodal model☆5,305Updated 3 weeks ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆2,921Updated last month
- State-of-the-art TTS model under 25MB 😻☆9,099Updated 3 months ago
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆2,924Updated this week
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,090Updated last month
- A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.☆3,409Updated 3 weeks ago
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,364Updated 5 months ago
- ☆6,028Updated 2 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,998Updated 7 months ago
- Text-audio foundation model from Boson AI☆7,642Updated 2 months ago
- Kernels & AI inference engine for phones☆3,725Updated last week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,805Updated 5 months ago
- Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!☆8,617Updated this week
- Towards Human-Sounding Speech☆5,729Updated 6 months ago
- The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.☆7,787Updated 2 weeks ago
- ☆8,223Updated last week
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆2,992Updated 4 months ago
- Examples using MLX Swift☆2,308Updated last week
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,325Updated 7 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,250Updated 4 months ago
- On-device TTS model by Neuphonic☆4,044Updated this week