apple / ml-fastvlmLinks
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
☆5,263Updated 3 months ago
Alternatives and similar repositories for ml-fastvlm
Users that are interested in ml-fastvlm are comparing it to the libraries listed below
Sorting:
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆3,855Updated last week
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆2,543Updated last week
- Everything about the SmolLM and SmolVLM family of models☆3,108Updated last week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,773Updated 2 months ago
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,148Updated last month
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,452Updated 2 months ago
- A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.☆2,863Updated this week
- Open-source unified multimodal model☆4,788Updated last month
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,563Updated 3 weeks ago
- Real-time webcam demo with SmolVLM and llama.cpp server☆4,077Updated 3 months ago
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO and designed for fine-tuning.☆2,696Updated last week
- SpatialLM: Training Large Language Models for Structured Indoor Modeling☆3,571Updated 3 weeks ago
- Renderer for the harmony response format to be used with gpt-oss☆2,637Updated this week
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆1,016Updated 8 months ago
- ☆6,843Updated 2 months ago
- Text-audio foundation model from Boson AI☆6,662Updated last week
- Run LLMs with MLX☆1,587Updated this week
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,001Updated last month
- Have a natural, spoken conversation with AI!☆2,901Updated last month
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆1,899Updated this week
- DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execut…☆16,053Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,517Updated 3 months ago
- StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language mo…☆3,975Updated 3 months ago
- ☆3,489Updated 4 months ago
- Nano vLLM☆5,698Updated last month
- Towards Human-Sounding Speech☆5,377Updated 3 months ago
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,524Updated 2 months ago
- State-of-the-art TTS model under 25MB 😻☆3,046Updated last week
- Cross-platform framework for deploying LLM/VLM/TTS models locally on smartphones.☆2,696Updated this week
- ⚙️ Create and run workflows (RPA 2.0)☆3,644Updated this week