roboflow / maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
β2,542Updated last week
Alternatives and similar repositories for maestro:
Users that are interested in maestro are comparing it to the libraries listed below
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,407Updated 3 weeks ago
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β611Updated last year
- 4M: Massively Multimodal Masked Modelingβ1,713Updated last month
- Turn any computer or edge device into a command center for your computer vision projects.β1,629Updated this week
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,678Updated 3 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,173Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,138Updated 3 weeks ago
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your dataβ1,406Updated 2 months ago
- β1,623Updated this week
- A Python package that makes it easy for developers to create AI apps powered by various AI providers.β1,595Updated last week
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoningβ1,958Updated last week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-β¦β3,396Updated 2 months ago
- Deploy high-performance AI models and inference pipelines on FastAPI with built-in batching, streaming and more.β3,057Updated last week
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engineβ448Updated 3 months ago
- Images to inference with no labeling (use foundation models to train supervised models).β2,223Updated last month
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β713Updated 9 months ago
- Python & JS/TS SDK for running AI-generated code/code interpreting in your AI appβ1,672Updated this week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β892Updated 2 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β771Updated 2 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,267Updated 4 months ago
- Everything about the SmolLM2 and SmolVLM family of modelsβ2,201Updated 3 weeks ago
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.β1,061Updated 2 months ago
- Implementing the 4 agentic patterns from scratchβ1,238Updated last month
- Document to Markdown OCR library with Llama 3.2 visionβ2,260Updated 3 months ago
- β3,712Updated last month
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agentsβ1,595Updated this week
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.β1,743Updated this week
- tiny vision language modelβ7,796Updated last week
- Witness the aha moment of VLM with less than $3.β3,549Updated last month
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understandingβ2,156Updated 3 months ago