Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
β210Jan 8, 2025Updated last year
Alternatives and similar repositories for COMM
Users that are interested in COMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β19Dec 6, 2023Updated 2 years ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β959Aug 5, 2025Updated 10 months ago
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learningβ112May 28, 2025Updated last year
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.β134Nov 10, 2025Updated 7 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Modelβ101Jul 15, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- β134Dec 22, 2023Updated 2 years ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scaleβ214Feb 27, 2024Updated 2 years ago
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"β258May 3, 2024Updated 2 years ago
- Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.β280Oct 26, 2024Updated last year
- VisionLLM Seriesβ1,148Feb 27, 2025Updated last year
- Official implementation of TagAlignβ37Dec 11, 2024Updated last year
- β59Aug 7, 2023Updated 2 years ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β509Aug 9, 2024Updated last year
- Recognize Any Regionsβ123Dec 18, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- β813Jul 8, 2024Updated last year
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"β268Jun 12, 2024Updated 2 years ago
- β90Nov 25, 2023Updated 2 years ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- [ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"β759Jan 22, 2024Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAIβ1,775Jan 12, 2026Updated 5 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ339Jul 17, 2024Updated last year
- EVA Series: Visual Representation Fantasies from BAAIβ2,683Aug 1, 2024Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,839Nov 27, 2025Updated 6 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β540Apr 8, 2024Updated 2 years ago
- A collection of visual instruction tuning datasets.β77Mar 14, 2024Updated 2 years ago
- Project Page for "LISA: Reasoning Segmentation via Large Language Model"β2,644Feb 16, 2025Updated last year
- [NeurIPS2023] Code release for "Hierarchical Open-vocabulary Universal Image Segmentation"β293Jun 19, 2025Updated 11 months ago
- [ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentationβ395Sep 19, 2023Updated 2 years ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understandingβ50Jan 9, 2024Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β865May 8, 2025Updated last year
- [ECCV 2024] Tokenize Anything via Promptingβ602Dec 11, 2024Updated last year
- Grounded Language-Image Pre-trainingβ2,599Jan 24, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023β201Apr 16, 2023Updated 3 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β132Aug 21, 2024Updated last year
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ555Jun 3, 2025Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMsβ100Jan 16, 2025Updated last year
- β128Jul 29, 2024Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β900Aug 13, 2024Updated last year
- [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Wantβ875Jul 20, 2025Updated 10 months ago