apple / ml-fastvlmLinks

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

☆4,313

Alternatives and similar repositories for ml-fastvlm

Users that are interested in ml-fastvlm are comparing it to the libraries listed below

Sorting:

Blaizzy / mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…
☆2,470Updated last week
huggingface / nanoVLM
The simplest, fastest repository for training/finetuning small-sized VLMs.
☆3,726Updated this week
microsoft / Magma
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
☆1,749Updated last month
roboflow / trackers
A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms
☆1,851Updated this week
ByteDance-Seed / Bagel
Open-source unified multimodal model
☆4,540Updated 2 weeks ago
Blaizzy / mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
☆1,498Updated this week
huggingface / smollm
Everything about the SmolLM and SmolVLM family of models
☆2,803Updated this week
manycore-research / SpatialLM
SpatialLM: Training Large Language Models for Structured Indoor Modeling
☆3,489Updated 3 weeks ago
bytedance / MegaTTS3
☆5,616Updated 2 months ago
ml-explore / mlx-lm
Run LLMs with MLX
☆1,276Updated this week
skyzh / tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers.
☆2,730Updated last month
ngxson / smolvlm-realtime-webcam
Real-time webcam demo with SmolVLM and llama.cpp server
☆4,031Updated 2 months ago
ml-explore / mlx-swift-examples
Examples using MLX Swift
☆1,947Updated last week
apple / ml-mobileclip
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…
☆988Updated 7 months ago
joanrod / star-vector
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language mo…
☆3,939Updated 3 months ago
roboflow / rf-detr
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
☆2,331Updated this week
NVlabs / describe-anything
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
☆1,241Updated 2 weeks ago
MiniMax-AI / MiniMax-M1
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
☆2,631Updated last week
cactus-compute / cactus
A cross-platform framework for deploying LLMs, VLMs, Embedding Models, TTS models and more locally on smartphones.
☆1,290Updated last week
MoonshotAI / Kimi-Audio
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
☆3,962Updated 3 weeks ago
blazickjp / arxiv-mcp-server
A Model Context Protocol server for searching and analyzing arXiv papers
☆1,397Updated last month
bytedance / Dolphin
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
☆4,206Updated this week
huggingface / chat-macOS
Making the community's best AI chat models available to everyone.
☆1,968Updated 5 months ago
XiaomiMiMo / MiMo
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
☆1,496Updated last month
MoonshotAI / Kimi-K2
Kimi K2 is the large language model series developed by Moonshot AI team
☆1,850Updated this week
KoljaB / RealtimeVoiceChat
Have a natural, spoken conversation with AI!
☆2,718Updated 3 weeks ago
QwenLM / Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…
☆3,309Updated last month
ML-GSAI / LLaDA
Official PyTorch implementation for "Large Language Diffusion Models"
☆2,530Updated 3 weeks ago
microsoft / BitNet
Official inference framework for 1-bit LLMs
☆20,482Updated last month
getzep / graphiti
Build Real-Time Knowledge Graphs for AI Agents
☆12,727Updated this week