This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆7,269May 5, 2025Updated 10 months ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,738Oct 27, 2025Updated 4 months ago
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,461Oct 9, 2025Updated 5 months ago
- MLX: An array framework for Apple silicon☆24,597Updated this week
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆24,144Mar 7, 2026Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆73,479Updated this week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆2,309Updated this week
- Examples using MLX Swift☆2,463Jan 22, 2026Updated 2 months ago
- Official inference framework for 1-bit LLMs☆35,906Mar 10, 2026Updated 2 weeks ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,671Jan 30, 2026Updated last month
- Swift API for MLX☆1,662Mar 12, 2026Updated last week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆24,603Aug 12, 2024Updated last year
- On-device Speech Recognition for Apple Silicon☆5,806Mar 16, 2026Updated last week
- Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.☆57,673Updated this week
- Stable Diffusion with Core ML on Apple Silicon☆17,824Jul 3, 2025Updated 8 months ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,204Mar 12, 2026Updated last week
- Everything about the SmolLM and SmolVLM family of models☆3,675Jan 13, 2026Updated 2 months ago
- LLM inference in C/C++☆98,911Updated this week
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆19,202Nov 19, 2025Updated 4 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆18,737Updated this week
- Examples in the MLX framework☆8,375Feb 12, 2026Updated last month
- Real-time webcam demo with SmolVLM and llama.cpp server☆5,532May 12, 2025Updated 10 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,374Apr 21, 2025Updated 11 months ago
- Lightweight coding agent that runs in your terminal☆65,974Updated this week
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆6,334Updated this week
- Universal memory layer for AI Agents☆50,147Mar 17, 2026Updated last week
- CoreNet: A library for training deep neural networks☆7,009Oct 9, 2025Updated 5 months ago
- Run frontier AI locally.☆42,805Updated this week
- Universal LLM Deployment Engine with ML Compilation☆22,246Updated this week
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆9,904Sep 22, 2025Updated 6 months ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,469Jun 26, 2025Updated 8 months ago
- ☆8,689Oct 9, 2024Updated last year
- tiny vision language model☆9,427Nov 14, 2025Updated 4 months ago
- Reference PyTorch implementation and models for DINOv3☆9,878Mar 11, 2026Updated last week
- Text-audio foundation model from Boson AI☆7,990Jan 18, 2026Updated 2 months ago
- Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full…☆13,197Updated this week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,898Mar 4, 2026Updated 2 weeks ago
- A simple screen parsing tool towards pure vision based GUI agent☆24,546Sep 12, 2025Updated 6 months ago
- Official repository for LTX-Video☆9,612Jan 5, 2026Updated 2 months ago
- Python tool for converting files and office documents to Markdown.☆91,227Mar 16, 2026Updated last week