roboflow / maestro
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
β1,427Updated this week
Alternatives and similar repositories for maestro:
Users that are interested in maestro are comparing it to the libraries listed below
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β596Updated 10 months ago
- β704Updated 10 months ago
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,667Updated this week
- Set-of-Mark Prompting for GPT-4V and LMMsβ1,241Updated 4 months ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,093Updated 3 weeks ago
- 4M: Massively Multimodal Masked Modelingβ1,666Updated 3 months ago
- Turn any computer or edge device into a command center for your computer vision projects.β1,443Updated this week
- Images to inference with no labeling (use foundation models to train supervised models).β2,058Updated last month
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β2,752Updated last week
- γEMNLP 2024π₯γVideo-LLaVA: Learning United Visual Representation by Alignment Before Projectionβ3,113Updated last month
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Seriesβ853Updated 5 months ago
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,337Updated last month
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anythingβ2,237Updated 3 weeks ago
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinfβ¦β792Updated last month
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β685Updated 6 months ago
- Mixture-of-Experts for Large Vision-Language Modelsβ2,051Updated last month
- β3,272Updated 3 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,143Updated last month
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbonesβ1,256Updated 9 months ago
- DeepSeek-VL: Towards Real-World Vision-Language Understandingβ2,306Updated 8 months ago
- LLaVA-Interactive-Demoβ360Updated 5 months ago
- PyTorch code and models for V-JEPA self-supervised learning from video.β2,745Updated 5 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ720Updated 11 months ago
- An Open-source Toolkit for LLM Developmentβ2,747Updated this week
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.β2,516Updated 3 weeks ago
- Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anythingβ1,152Updated 2 months ago
- VisionLLM Seriesβ977Updated 2 weeks ago
- Training LLMs with QLoRA + FSDPβ1,436Updated 2 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detectionβ4,942Updated 2 months ago