roboflow / maestroLinks
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
β2,659Updated 2 weeks ago
Alternatives and similar repositories for maestro
Users that are interested in maestro are comparing it to the libraries listed below
Sorting:
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,869Updated last month
- 4M: Massively Multimodal Masked Modelingβ1,789Updated 8 months ago
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,684Updated last year
- Images to inference with no labeling (use foundation models to train supervised models).β2,616Updated 8 months ago
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.β3,797Updated this week
- Turn any computer or edge device into a command center for your computer vision projects.β2,183Updated this week
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β637Updated last year
- β3,071Updated 2 months ago
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β742Updated 8 months ago
- tiny vision language modelβ9,303Updated 2 months ago
- Trackers gives you clean, modular re-implementations of leading multi-object tracking algorithms released under the permissive Apache 2.0β¦β2,389Updated this week
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,396Updated 6 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β842Updated last year
- [ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoningβ2,124Updated last month
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.β1,154Updated last year
- π€ MLE-Agent: Your intelligent companion for seamless AI engineering and research. π Integrate with arxiv and paper with code to provideβ¦β1,510Updated 6 months ago
- Fast State-of-the-Art Static Embeddingsβ1,992Updated last month
- [ICLR 2026] RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed forβ¦β5,527Updated this week
- Llama-3 agents that can browse the web by following instructions and talking to youβ1,407Updated last year
- PyTorch code and models for V-JEPA self-supervised learning from video.β3,499Updated 11 months ago
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergyβ2,630Updated 3 months ago
- Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and π video, up to 5x faster thanβ¦β1,217Updated 3 months ago
- A Python package that makes it easy for developers to create AI apps powered by various AI providers.β1,646Updated 10 months ago
- Everything about the SmolLM and SmolVLM family of modelsβ3,594Updated 3 weeks ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.β3,154Updated this week
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraβ¦β2,840Updated this week
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engineβ493Updated 6 months ago
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025β1,416Updated 4 months ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understandingβ2,368Updated 8 months ago
- YOLOE: Real-Time Seeing Anything [ICCV 2025]β2,029Updated 7 months ago