roboflow / maestroLinks
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
β2,651Updated this week
Alternatives and similar repositories for maestro
Users that are interested in maestro are comparing it to the libraries listed below
Sorting:
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,853Updated last week
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β636Updated last year
- 4M: Massively Multimodal Masked Modelingβ1,783Updated 7 months ago
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,683Updated last year
- Turn any computer or edge device into a command center for your computer vision projects.β2,160Updated this week
- Images to inference with no labeling (use foundation models to train supervised models).β2,565Updated 8 months ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understandingβ2,265Updated 7 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,999Updated this week
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β838Updated 11 months ago
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraβ¦β2,799Updated last week
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.β3,766Updated last week
- π€ MLE-Agent: Your intelligent companion for seamless AI engineering and research. π Integrate with arxiv and paper with code to provideβ¦β1,501Updated 5 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.β2,442Updated last week
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.β1,154Updated 11 months ago
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β743Updated 7 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-β¦β3,820Updated 8 months ago
- β3,062Updated last month
- β716Updated last year
- This repository shares end-to-end notebooks on how to use various Weaviate features and integrations!β933Updated last month
- A Python package that makes it easy for developers to create AI apps powered by various AI providers.β1,647Updated 9 months ago
- Fast State-of-the-Art Static Embeddingsβ1,982Updated 2 weeks ago
- Collection of notebook guides created by the Brev.dev team!β1,808Updated this week
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithmsβ2,206Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,718Updated last month
- [ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoningβ2,110Updated last month
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.β1,475Updated 4 months ago
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engineβ491Updated 5 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β1,413Updated 8 months ago
- Llama-3 agents that can browse the web by following instructions and talking to youβ1,408Updated last year
- Knowledge Agents and Management in the Cloudβ4,226Updated last week