apple / ml-fastvlmLinks
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆7,194Updated 9 months ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below
Sorting:
- Run LLMs with MLX☆3,577Updated this week
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,416Updated 4 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆2,108Updated last week
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆5,842Updated this week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,895Updated 2 weeks ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆3,399Updated last month
- Text-audio foundation model from Boson AI☆7,912Updated 3 weeks ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,231Updated 4 months ago
- Renderer for the harmony response format to be used with gpt-oss☆4,171Updated last month
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,625Updated 3 months ago
- Sharp Monocular View Synthesis in Less Than a Second☆7,447Updated last month
- Everything about the SmolLM and SmolVLM family of models☆3,602Updated 3 weeks ago
- The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trai…☆3,286Updated last month
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,448Updated 7 months ago
- Kimi K2 is the large language model series developed by Moonshot AI team☆10,347Updated 3 weeks ago
- Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and se…☆4,582Updated last week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,914Updated 8 months ago
- Contexts Optical Compression☆22,430Updated 2 weeks ago
- GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models☆4,144Updated last week
- Open-source unified multimodal model☆5,654Updated 3 months ago
- Kernels & AI inference engine for mobile devices.☆4,238Updated last week
- State-of-the-art TTS model under 25MB 😻☆9,590Updated last week
- Examples using MLX Swift☆2,413Updated 2 weeks ago
- Trackers gives you clean, modular re-implementations of leading multi-object tracking algorithms released under the permissive Apache 2.0…☆2,389Updated this week
- Have a natural, spoken conversation with AI!☆3,506Updated 7 months ago
- The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading t…☆7,665Updated last week
- A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.☆3,774Updated last month
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,262Updated 9 months ago
- [ICLR 2026] RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for…☆5,527Updated this week
- Reference PyTorch implementation and models for DINOv3☆9,525Updated 2 months ago