This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆7,302May 5, 2025Updated 11 months ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,801Oct 27, 2025Updated 5 months ago
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,484Oct 9, 2025Updated 6 months ago
- MLX: An array framework for Apple silicon☆25,200Updated this week
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆24,322Apr 1, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆75,637Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Examples using MLX Swift☆2,496Updated this week
- Official inference framework for 1-bit LLMs☆38,049Mar 10, 2026Updated last month
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,917Jan 30, 2026Updated 2 months ago
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆24,675Aug 12, 2024Updated last year
- Swift API for MLX☆1,751Updated this week
- On-device Speech Recognition for Apple Silicon☆5,949Apr 1, 2026Updated last week
- Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.☆61,312Updated this week
- Stable Diffusion with Core ML on Apple Silicon☆17,821Jul 3, 2025Updated 9 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,705Apr 2, 2026Updated last week
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,242Mar 12, 2026Updated last month
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆18,904Updated this week
- LLM inference in C/C++☆103,237Updated this week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆4,138Apr 6, 2026Updated last week
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆19,247Nov 19, 2025Updated 4 months ago
- Examples in the MLX framework☆8,459Apr 6, 2026Updated last week
- Real-time webcam demo with SmolVLM and llama.cpp server☆5,541May 12, 2025Updated 11 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,428Apr 21, 2025Updated 11 months ago
- Lightweight coding agent that runs in your terminal☆73,775Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆6,641Updated this week
- Universal memory layer for AI Agents☆52,137Apr 6, 2026Updated last week
- CoreNet: A library for training deep neural networks☆7,004Oct 9, 2025Updated 6 months ago
- Run frontier AI locally.☆43,503Updated this week
- Universal LLM Deployment Engine with ML Compilation☆22,414Apr 6, 2026Updated last week
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆9,949Sep 22, 2025Updated 6 months ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,477Jun 26, 2025Updated 9 months ago
- Text-audio foundation model from Boson AI☆8,020Jan 18, 2026Updated 2 months ago
- ☆8,685Oct 9, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- tiny vision language model☆9,554Nov 14, 2025Updated 5 months ago
- Reference PyTorch implementation and models for DINOv3☆10,057Mar 30, 2026Updated 2 weeks ago
- Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full…☆13,436Updated this week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆10,010Mar 4, 2026Updated last month
- A simple screen parsing tool towards pure vision based GUI agent☆24,619Sep 12, 2025Updated 7 months ago
- Official repository for LTX-Video☆9,872Jan 5, 2026Updated 3 months ago
- Toolkit for linearizing PDFs for LLM datasets/training☆17,120Mar 25, 2026Updated 2 weeks ago