apple / ml-fastvlmLinks
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆4,313Updated 2 months ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below
Sorting:
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆2,470Updated last week
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆3,726Updated this week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,749Updated last month
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆1,851Updated this week
- Open-source unified multimodal model☆4,540Updated 2 weeks ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,498Updated this week
- Everything about the SmolLM and SmolVLM family of models☆2,803Updated this week
- SpatialLM: Training Large Language Models for Structured Indoor Modeling☆3,489Updated 3 weeks ago
- ☆5,616Updated 2 months ago
- Run LLMs with MLX☆1,276Updated this week
- A course of learning LLM inference serving on Apple Silicon for systems engineers.☆2,730Updated last month
- Real-time webcam demo with SmolVLM and llama.cpp server☆4,031Updated 2 months ago
- Examples using MLX Swift☆1,947Updated last week
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆988Updated 7 months ago
- StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language mo…☆3,939Updated 3 months ago
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.☆2,331Updated this week
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,241Updated 2 weeks ago
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆2,631Updated last week
- A cross-platform framework for deploying LLMs, VLMs, Embedding Models, TTS models and more locally on smartphones.☆1,290Updated last week
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆3,962Updated 3 weeks ago
- A Model Context Protocol server for searching and analyzing arXiv papers☆1,397Updated last month
- The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.☆4,206Updated this week
- Making the community's best AI chat models available to everyone.☆1,968Updated 5 months ago
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,496Updated last month
- Kimi K2 is the large language model series developed by Moonshot AI team☆1,850Updated this week
- Have a natural, spoken conversation with AI!☆2,718Updated 3 weeks ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,309Updated last month
- Official PyTorch implementation for "Large Language Diffusion Models"☆2,530Updated 3 weeks ago
- Official inference framework for 1-bit LLMs☆20,482Updated last month
- Build Real-Time Knowledge Graphs for AI Agents☆12,727Updated this week