roboflow / maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
β2,513Updated this week
Alternatives and similar repositories for maestro:
Users that are interested in maestro are comparing it to the libraries listed below
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,316Updated last week
- Turn any computer or edge device into a command center for your computer vision projects.β1,578Updated this week
- Images to inference with no labeling (use foundation models to train supervised models).β2,176Updated 3 months ago
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,678Updated 2 months ago
- 4M: Massively Multimodal Masked Modelingβ1,701Updated 2 weeks ago
- Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.β2,996Updated this week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,027Updated this week
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoningβ1,899Updated 2 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.β1,614Updated this week
- β2,889Updated 6 months ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understandingβ2,137Updated 2 months ago
- Everything about the SmolLM2 and SmolVLM family of modelsβ2,035Updated last week
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β606Updated last year
- PyTorch native post-training libraryβ5,014Updated this week
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other entβ¦β2,604Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,025Updated 2 weeks ago
- β1,567Updated last week
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β708Updated 8 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,568Updated this week
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.β1,049Updated last month
- β3,572Updated 3 weeks ago
- Vision infrastructure to turn complex documents into RAG/LLM-ready dataβ2,017Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,530Updated this week
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β1,002Updated last month
- Llama-3 agents that can browse the web by following instructions and talking to youβ1,391Updated 3 months ago
- β708Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-β¦β3,321Updated last month