apple / ml-fastvlmLinks
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆3,965Updated 3 weeks ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below
Sorting:
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆2,303Updated this week
- ☆3,008Updated this week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,670Updated this week
- (🚧 WIP) a course of LLM inference serving on Apple Silicon for systems engineers.☆1,935Updated this week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,197Updated last week
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆3,003Updated this week
- ☆6,275Updated last week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,293Updated this week
- A Model Context Protocol server for searching and analyzing arXiv papers☆1,167Updated last month
- Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme☆6,017Updated this week
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆10,709Updated 2 weeks ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,042Updated this week
- StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language mo…☆3,852Updated last month
- Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!☆5,192Updated 2 weeks ago
- Everything about the SmolLM2 and SmolVLM family of models☆2,442Updated 2 months ago
- Run LLMs with MLX☆877Updated this week
- Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget)☆4,349Updated last week
- Official repository for LTX-Video☆6,281Updated last week
- Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,121Updated 3 weeks ago
- 🚀 The fast, Pythonic way to build MCP servers and clients☆11,359Updated this week
- Collection of apple-native tools for the model context protocol.☆1,679Updated last month
- The official Python SDK for Model Context Protocol servers and clients☆13,211Updated last week
- Wan: Open and Advanced Large-Scale Video Generative Models☆11,851Updated this week
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆1,733Updated this week
- DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execut…☆12,218Updated this week
- MAGI-1: Autoregressive Video Generation at Scale☆3,191Updated this week
- Build effective agents using Model Context Protocol and simple workflow patterns☆4,855Updated this week
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆3,687Updated this week
- Have a natural, spoken conversation with AI!☆2,375Updated 2 weeks ago
- ☆4,316Updated 2 months ago