[ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
β362Jan 14, 2025Updated last year
Alternatives and similar repositories for GiT
Users that are interested in GiT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Birdβs-Eye-View Representation"β354Sep 4, 2024Updated last year
- [CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"β451Sep 4, 2024Updated last year
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β531Apr 8, 2024Updated 2 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β131Aug 21, 2024Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β951Aug 5, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scaleβ1,170Oct 21, 2024Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenizationβ586Jun 7, 2024Updated last year
- [CVPR2022] This is the official code of "RBGNet: Ray-based Grouping for 3D Object Detection".β40Jul 17, 2023Updated 2 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"β212Jun 9, 2024Updated last year
- Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]β1,345Oct 15, 2025Updated 6 months ago
- This is the official code release for our work, Denoising Vision Transformers.β396Nov 13, 2024Updated last year
- VisionLLM Seriesβ1,142Feb 27, 2025Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,992Nov 7, 2025Updated 5 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ337Jul 17, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generationβ86Sep 12, 2024Updated last year
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".β1,034Aug 4, 2025Updated 8 months ago
- [ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detectionβ13Apr 12, 2024Updated 2 years ago
- [NeurIPS2025 Spotlight π₯ ] Official implementation of πΈ "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Languβ¦β271Nov 5, 2025Updated 5 months ago
- [ECCV 2024] Tokenize Anything via Promptingβ601Dec 11, 2024Updated last year
- [ICLR 2025 oral] RMP-SAM: Towards Real-Time Multi-Purpose Segment Anythingβ268Apr 11, 2025Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β506Aug 9, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β260Feb 11, 2025Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ392Jul 9, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)β186Jul 5, 2024Updated last year
- β1,843Jun 28, 2024Updated last year
- Project Page for "LISA: Reasoning Segmentation via Large Language Model"β2,620Feb 16, 2025Updated last year
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,947Aug 15, 2024Updated last year
- ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention (ECCV 2024)β80May 20, 2025Updated 10 months ago
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β456Aug 8, 2025Updated 8 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β702Jan 7, 2024Updated 2 years ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,664Aug 1, 2024Updated last year
- Official repository for "AM-RADIO: Reduce All Domains Into One"β1,759Apr 9, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ158Dec 6, 2024Updated last year
- [CVPR 2024 Highlight] Visual Point Cloud Forecastingβ348Jul 2, 2025Updated 9 months ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ555Jun 3, 2025Updated 10 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,414Aug 4, 2025Updated 8 months ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ660Jun 13, 2025Updated 10 months ago
- [NeurIPS24 Spotlight] Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detectionβ158Sep 26, 2024Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,832Nov 27, 2025Updated 4 months ago