roboflow / maestro
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
β1,381Updated this week
Related projects β
Alternatives and complementary repositories for maestro
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API π₯β1,644Updated 8 months ago
- ποΈ + π¬ + π§ = π€ Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]β571Updated 8 months ago
- Set-of-Mark Prompting for GPT-4V and LMMsβ1,165Updated 2 months ago
- 4M: Massively Multimodal Masked Modelingβ1,601Updated last month
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ865Updated 2 months ago
- A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and β¦β1,356Updated this week
- β699Updated 8 months ago
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin andβ¦β1,980Updated last week
- PyTorch code and models for V-JEPA self-supervised learning from video.β2,665Updated 3 months ago
- Images to inference with no labeling (use foundation models to train supervised models).β1,974Updated last week
- Mixture-of-Experts for Large Vision-Language Modelsβ1,975Updated 5 months ago
- DeepSeek-VL: Towards Real-World Vision-Language Understandingβ2,065Updated 6 months ago
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anythingβ2,152Updated 5 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ702Updated 9 months ago
- γEMNLP 2024π₯γVideo-LLaVA: Learning United Visual Representation by Alignment Before Projectionβ2,968Updated last month
- This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Modelβ¦β696Updated 6 months ago
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbonesβ1,248Updated 6 months ago
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]β662Updated 4 months ago
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environmentsβ1,344Updated this week
- β2,837Updated 3 weeks ago
- Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generationβ917Updated last week
- API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Seriesβ771Updated 3 months ago
- LLaVA-Interactive-Demoβ351Updated 3 months ago
- Mora: More like Sora for Generalist Video Generationβ1,513Updated last month
- ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Expertβ¦β1,248Updated 2 weeks ago
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.β687Updated 2 months ago
- VisionLLM Seriesβ904Updated 3 weeks ago
- Training LLMs with QLoRA + FSDPβ1,418Updated this week
- β448Updated 7 months ago