apple / ml-fastvlmLinks
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆6,583Updated 4 months ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below
Sorting:
- Everything about the SmolLM and SmolVLM family of models☆3,247Updated last week
- Text-audio foundation model from Boson AI☆7,291Updated last week
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,047Updated last week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,644Updated 2 weeks ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,641Updated 3 months ago
- Renderer for the harmony response format to be used with gpt-oss☆3,800Updated last month
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,198Updated 3 weeks ago
- Run LLMs with MLX☆2,313Updated this week
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,333Updated 2 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,806Updated 3 months ago
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,218Updated this week
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆2,674Updated last week
- Open-source unified multimodal model☆5,038Updated last month
- Kimi K2 is the large language model series developed by Moonshot AI team☆8,212Updated last week
- Wan: Open and Advanced Large-Scale Video Generative Models☆5,453Updated this week
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆2,873Updated 2 months ago
- Real-time webcam demo with SmolVLM and llama.cpp server☆4,745Updated 4 months ago
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆4,466Updated 2 weeks ago
- SpatialLM: Training Large Language Models for Structured Indoor Modeling☆3,970Updated 3 weeks ago
- gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI☆18,478Updated this week
- Reference PyTorch implementation and models for DINOv3☆7,021Updated last week
- Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.☆5,045Updated 2 weeks ago
- Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and se…☆3,822Updated last week
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,270Updated 5 months ago
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,562Updated 3 months ago
- Open-source implementation of AlphaEvolve☆3,936Updated this week
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,243Updated 3 months ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,954Updated 6 months ago
- The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.☆5,811Updated 3 weeks ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆1,627Updated last week