roboflow / maestroLinks

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

☆2,642

Alternatives and similar repositories for maestro

Users that are interested in maestro are comparing it to the libraries listed below

Sorting:

merveenoyan / smol-vision
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
☆1,799Updated last month
roboflow / awesome-openai-vision-api-experiments
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
☆1,683Updated 10 months ago
apple / ml-4m
4M: Massively Multimodal Masked Modeling
☆1,773Updated 6 months ago
SkalskiP / awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
☆635Updated last year
roboflow / inference
Turn any computer or edge device into a command center for your computer vision projects.
☆2,090Updated this week
autodistill / autodistill
Images to inference with no labeling (use foundation models to train supervised models).
☆2,497Updated 6 months ago
MLSysOps / MLE-agent
🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide…
☆1,424Updated 4 months ago
roboflow / rf-detr
RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tun…
☆4,545Updated 3 weeks ago
Lightning-AI / LitServe
Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.
☆3,721Updated this week
SkalskiP / top-cvpr-2024-papers
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
☆744Updated 6 months ago
apple / ml-aim
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,387Updated 4 months ago
landing-ai / vision-agent
Vision agent
☆5,136Updated this week
roboflow / trackers
A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms
☆2,176Updated last week
X-PLUG / mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
☆2,260Updated 6 months ago
huggingface / smollm
Everything about the SmolLM and SmolVLM family of models
☆3,433Updated 2 weeks ago
AnswerDotAI / byaldi
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
☆832Updated 10 months ago
IDEA-Research / T-Rex
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
☆2,604Updated last month
brevdev / launchables
Collection of notebook guides created by the Brev.dev team!
☆1,804Updated 3 weeks ago
SkalskiP / vlms-zero-to-hero
This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
☆1,152Updated 10 months ago
SkunkworksAI / BakLLaVA
☆715Updated last year
microsoft / SoM
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
☆1,487Updated last year
NVlabs / VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,679Updated last week
mistralai / mistral-finetune
☆3,039Updated 2 weeks ago
illuin-tech / colpali
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
☆2,354Updated last week
mistralai / cookbook
☆2,078Updated 2 weeks ago
unum-cloud / UForm
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than…
☆1,198Updated last month
tjmlabs / ColiVara
Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…
☆1,387Updated 7 months ago
huggingface / nanoVLM
The simplest, fastest repository for training/finetuning small-sized VLMs.
☆4,331Updated last month
PKU-YuanGroup / MoE-LLaVA
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,278Updated 4 months ago
IntelLabs / RAG-FiT
Framework for enhancing LLMs for RAG tasks using fine-tuning.
☆758Updated 6 months ago