roboflow / maestroLinks
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
β2,578Updated this week
Alternatives and similar repositories for maestro
Users that are interested in maestro are comparing it to the libraries listed below
Sorting:
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,503Updated 3 weeks ago
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,681Updated 5 months ago
- 4M: Massively Multimodal Masked Modelingβ1,740Updated 3 weeks ago
- Turn any computer or edge device into a command center for your computer vision projects.β1,747Updated this week
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β621Updated last year
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.β2,269Updated 2 weeks ago
- Images to inference with no labeling (use foundation models to train supervised models).β2,300Updated last month
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,381Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,773Updated this week
- The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.β3,339Updated this week
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.β1,092Updated 5 months ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understandingβ2,202Updated 3 weeks ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,344Updated last week
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β732Updated 3 weeks ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2β2,347Updated last month
- Everything about the SmolLM2 and SmolVLM family of modelsβ2,590Updated 2 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,305Updated 2 months ago
- Fast State-of-the-Art Static Embeddingsβ1,743Updated 3 weeks ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!β1,294Updated 3 weeks ago
- A Kubernetes deployable instance of GroundX for document parsing, storage, and search.β758Updated this week
- β710Updated last year
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinfβ¦β971Updated 7 months ago
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMsβ1,411Updated 10 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β797Updated 4 months ago
- Vision agentβ4,881Updated last week
- β3,945Updated 2 weeks ago
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Seriesβ972Updated 5 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β1,138Updated last month
- PyTorch native post-training libraryβ5,287Updated this week
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoningβ2,016Updated last month