[ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
β362Jan 14, 2025Updated last year
Alternatives and similar repositories for GiT
Users that are interested in GiT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Birdβs-Eye-View Representation"β354Sep 4, 2024Updated last year
- [CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"β451Sep 4, 2024Updated last year
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β534Apr 8, 2024Updated 2 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β132Aug 21, 2024Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β953Aug 5, 2025Updated 9 months ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scaleβ1,172Oct 21, 2024Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenizationβ587Jun 7, 2024Updated last year
- [CVPR2022] This is the official code of "RBGNet: Ray-based Grouping for 3D Object Detection".β40Jul 17, 2023Updated 2 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"β211Jun 9, 2024Updated last year
- Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]β1,346Oct 15, 2025Updated 6 months ago
- This is the official code release for our work, Denoising Vision Transformers.β397Nov 13, 2024Updated last year
- VisionLLM Seriesβ1,144Feb 27, 2025Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,996Nov 7, 2025Updated 6 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ338Jul 17, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generationβ86Sep 12, 2024Updated last year
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".β1,033Aug 4, 2025Updated 9 months ago
- [ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detectionβ13Apr 12, 2024Updated 2 years ago
- [NeurIPS2025 Spotlight π₯ ] Official implementation of πΈ "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Languβ¦β273Nov 5, 2025Updated 6 months ago
- [ECCV 2024] Tokenize Anything via Promptingβ602Dec 11, 2024Updated last year
- [ICLR 2025 oral] RMP-SAM: Towards Real-Time Multi-Purpose Segment Anythingβ268Apr 11, 2025Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β507Aug 9, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β267Feb 11, 2025Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ392Jul 9, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)β187Jul 5, 2024Updated last year
- β1,843Jun 28, 2024Updated last year
- Project Page for "LISA: Reasoning Segmentation via Large Language Model"β2,628Feb 16, 2025Updated last year
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,947Aug 15, 2024Updated last year
- ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention (ECCV 2024)β82May 20, 2025Updated 11 months ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,669Aug 1, 2024Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β704Jan 7, 2024Updated 2 years ago
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β459Aug 8, 2025Updated 8 months ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"β1,779Apr 30, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- [CVPR 2024 Highlight] Visual Point Cloud Forecastingβ348Jul 2, 2025Updated 10 months ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ662Jun 13, 2025Updated 10 months ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ555Jun 3, 2025Updated 11 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,416Aug 4, 2025Updated 9 months ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,834Nov 27, 2025Updated 5 months ago
- Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.β279Oct 26, 2024Updated last year