roboflow / maestroLinks
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
☆2,641Updated this week
Alternatives and similar repositories for maestro
Users that are interested in maestro are comparing it to the libraries listed below
Sorting:
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆1,642Updated last month
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥☆1,682Updated 9 months ago
- 4M: Massively Multimodal Masked Modeling☆1,764Updated 4 months ago
- 👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]☆636Updated last year
- Turn any computer or edge device into a command center for your computer vision projects.☆1,997Updated this week
- Images to inference with no labeling (use foundation models to train supervised models).☆2,418Updated 5 months ago
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]☆741Updated 4 months ago
- RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tun…☆3,740Updated last week
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,377Updated 2 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,602Updated 2 months ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,249Updated 4 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,332Updated last month
- Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.☆3,597Updated last week
- ☆3,031Updated last year
- [ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆2,084Updated last week
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,262Updated 2 weeks ago
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆2,148Updated last week
- ☆714Updated last year
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy☆2,582Updated last week
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs☆1,467Updated last year
- A Python package that makes it easy for developers to create AI apps powered by various AI providers.☆1,651Updated 6 months ago
- tiny vision language model☆8,814Updated last month
- Collection of notebook guides created by the Brev.dev team!☆1,798Updated last week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,733Updated last week
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,245Updated 7 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆825Updated 8 months ago
- Vision agent☆5,081Updated last month
- Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM☆1,447Updated 7 months ago
- ☆2,016Updated last week
- About This repository is a curated collection of the most exciting and influential CVPR 2025 papers. 🔥 [Paper + Code + Demo]☆795Updated 4 months ago