roboflow / maestroLinks
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
β2,642Updated this week
Alternatives and similar repositories for maestro
Users that are interested in maestro are comparing it to the libraries listed below
Sorting:
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,799Updated last month
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,683Updated 10 months ago
- 4M: Massively Multimodal Masked Modelingβ1,773Updated 6 months ago
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β635Updated last year
- Turn any computer or edge device into a command center for your computer vision projects.β2,090Updated this week
- Images to inference with no labeling (use foundation models to train supervised models).β2,497Updated 6 months ago
- π€ MLE-Agent: Your intelligent companion for seamless AI engineering and research. π Integrate with arxiv and paper with code to provideβ¦β1,424Updated 4 months ago
- RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tunβ¦β4,545Updated 3 weeks ago
- Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.β3,721Updated this week
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β744Updated 6 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,387Updated 4 months ago
- Vision agentβ5,136Updated this week
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithmsβ2,176Updated last week
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understandingβ2,260Updated 6 months ago
- Everything about the SmolLM and SmolVLM family of modelsβ3,433Updated 2 weeks ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β832Updated 10 months ago
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergyβ2,604Updated last month
- Collection of notebook guides created by the Brev.dev team!β1,804Updated 3 weeks ago
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.β1,152Updated 10 months ago
- β715Updated last year
- [arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMsβ1,487Updated last year
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,679Updated last week
- β3,039Updated 2 weeks ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.β2,354Updated last week
- β2,078Updated 2 weeks ago
- Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and π video, up to 5x faster thanβ¦β1,198Updated last month
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β1,387Updated 7 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.β4,331Updated last month
- γTMM 2025π₯γ Mixture-of-Experts for Large Vision-Language Modelsβ2,278Updated 4 months ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β758Updated 6 months ago