๐๏ธ + ๐ฌ + ๐ง = ๐ค Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
โ637Feb 29, 2024Updated 2 years ago
Alternatives and similar repositories for awesome-foundation-and-multimodal-models
Users that are interested in awesome-foundation-and-multimodal-models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API ๐ฅโ1,685Jan 14, 2025Updated last year
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VLโ2,671May 1, 2026Updated last week
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained modeโฆโ12Jul 30, 2024Updated last year
- Each week I create sketches covering key Computer Vision concepts. If you want to learn more about CV stick around!โ150Mar 13, 2023Updated 3 years ago
- โ548Nov 7, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer โข AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024โ1,834Nov 27, 2025Updated 5 months ago
- Images to inference with no labeling (use foundation models to train supervised models).โ2,688May 14, 2025Updated 11 months ago
- โ135Nov 24, 2023Updated 2 years ago
- โ719Mar 6, 2024Updated 2 years ago
- List of resources, libraries and more for developers who would like to build with open-source machine learning off-the-shelfโ197Apr 1, 2024Updated 2 years ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. ๐โ1,914Jan 9, 2026Updated 4 months ago
- YOLOv10: Real-Time End-to-End Object Detectionโ12May 24, 2024Updated last year
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsโ279Apr 17, 2024Updated 2 years ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsโ766Feb 1, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform โข AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- a state-of-the-art-level open visual language model | ๅคๆจกๆ้ข่ฎญ็ปๆจกๅโ6,739May 29, 2024Updated last year
- This repository is a curated collection of the most exciting and influential CVPR 2024 papers. ๐ฅ [Paper + Code + Demo]โ735Apr 15, 2026Updated 3 weeks ago
- A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures lโฆโ9,364Mar 27, 2026Updated last month
- ใTMM 2025๐ฅใ Mixture-of-Experts for Large Vision-Language Modelsโ2,314Jul 15, 2025Updated 9 months ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.โ3,436May 19, 2025Updated 11 months ago
- This repository is a curated collection of the most exciting and influential CVPR 2023 papers. ๐ฅ [Paper + Code]โ647Apr 15, 2026Updated 3 weeks ago
- โ14Dec 7, 2023Updated 2 years ago
- Latest Advances on Multimodal Large Language Modelsโ17,736May 1, 2026Updated last week
- YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within secondsโ141Apr 6, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Unofficial implementation and experiments related to Set-of-Mark (SoM) ๐๏ธโ87Oct 20, 2023Updated 2 years ago
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.โ24,753Aug 12, 2024Updated last year
- Gradio UI for a Cog APIโ70Apr 8, 2024Updated 2 years ago
- A component that allows you to annotate an image with points and boxes.โ21Dec 12, 2023Updated 2 years ago
- Official Code for Tracking Any Object Amodallyโ123Jul 11, 2024Updated last year
- [CVPR 2024] Real-Time Open-Vocabulary Object Detectionโ6,336Feb 26, 2025Updated last year
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1โฆโ14Jan 19, 2024Updated 2 years ago
- ๐ฅ๐ฅ LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)โ844Aug 5, 2025Updated 9 months ago
- A family of lightweight multimodal models.โ1,054Nov 18, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways โข AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectioโฆโ88May 29, 2024Updated last year
- This repository contains demos I made with the Transformers library by HuggingFace.โ11,620Apr 20, 2026Updated 2 weeks ago
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parametersโ5,928Mar 14, 2024Updated 2 years ago
- Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.โ5,028Feb 24, 2026Updated 2 months ago
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.โ1,716Apr 27, 2026Updated last week
- 4M: Massively Multimodal Masked Modelingโ1,794Jun 2, 2025Updated 11 months ago
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactionsโ2,924May 26, 2025Updated 11 months ago