[ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
β360Jan 14, 2025Updated last year
Alternatives and similar repositories for GiT
Users that are interested in GiT are comparing it to the libraries listed below
Sorting:
- [ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Birdβs-Eye-View Representation"β351Sep 4, 2024Updated last year
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β529Apr 8, 2024Updated last year
- [CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"β448Sep 4, 2024Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β945Aug 5, 2025Updated 7 months ago
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scaleβ1,170Oct 21, 2024Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β131Aug 21, 2024Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenizationβ583Jun 7, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ336Jul 17, 2024Updated last year
- This is the official code release for our work, Denoising Vision Transformers.β394Nov 13, 2024Updated last year
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"β211Jun 9, 2024Updated last year
- Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]β1,342Oct 15, 2025Updated 4 months ago
- VisionLLM Seriesβ1,137Feb 27, 2025Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,986Nov 7, 2025Updated 4 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generationβ86Sep 12, 2024Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β505Aug 9, 2024Updated last year
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".β1,029Aug 4, 2025Updated 7 months ago
- [ECCV 2024] Tokenize Anything via Promptingβ602Dec 11, 2024Updated last year
- [ICLR 2025 oral] RMP-SAM: Towards Real-Time Multi-Purpose Segment Anythingβ268Apr 11, 2025Updated 10 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ391Jul 9, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β255Feb 11, 2025Updated last year
- β1,842Jun 28, 2024Updated last year
- Official repository for "AM-RADIO: Reduce All Domains Into One"β1,682Feb 11, 2026Updated 3 weeks ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β691Jan 7, 2024Updated 2 years ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,937Aug 15, 2024Updated last year
- (NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detectionβ123Apr 26, 2024Updated last year
- OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Drivingβ195May 31, 2024Updated last year
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,403Aug 4, 2025Updated 7 months ago
- An Efficient, Flexible, and General deep learning framework that retains minimal.β130Dec 25, 2023Updated 2 years ago
- Official repository for the paper PLLaVAβ676Jul 28, 2024Updated last year
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)β186Jul 5, 2024Updated last year
- EVA Series: Visual Representation Fantasies from BAAIβ2,648Aug 1, 2024Updated last year
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β443Aug 8, 2025Updated 6 months ago
- [ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detectionβ13Apr 12, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- [CVPR 2024 Highlight] Visual Point Cloud Forecastingβ346Jul 2, 2025Updated 8 months ago
- Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.β275Oct 26, 2024Updated last year
- Project Page for "LISA: Reasoning Segmentation via Large Language Model"β2,589Feb 16, 2025Updated last year
- [ECCV2024] VideoMamba: State Space Model for Efficient Video Understandingβ1,082Jul 6, 2024Updated last year
- [ECCV 2024] 3D World Model for Autonomous Drivingβ525Apr 12, 2024Updated last year