[ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
β360Jan 14, 2025Updated last year
Alternatives and similar repositories for GiT
Users that are interested in GiT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Birdβs-Eye-View Representation"β353Sep 4, 2024Updated last year
- [CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"β452Sep 4, 2024Updated last year
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β531Apr 8, 2024Updated last year
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β131Aug 21, 2024Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β951Aug 5, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scaleβ1,171Oct 21, 2024Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenizationβ584Jun 7, 2024Updated last year
- [CVPR2022] This is the official code of "RBGNet: Ray-based Grouping for 3D Object Detection".β40Jul 17, 2023Updated 2 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"β212Jun 9, 2024Updated last year
- Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]β1,344Oct 15, 2025Updated 5 months ago
- This is the official code release for our work, Denoising Vision Transformers.β395Nov 13, 2024Updated last year
- VisionLLM Seriesβ1,139Feb 27, 2025Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,995Nov 7, 2025Updated 4 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ336Jul 17, 2024Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generationβ86Sep 12, 2024Updated last year
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".β1,030Aug 4, 2025Updated 7 months ago
- [ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detectionβ13Apr 12, 2024Updated last year
- [NeurIPS2025 Spotlight π₯ ] Official implementation of πΈ "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Languβ¦β269Nov 5, 2025Updated 4 months ago
- [ECCV 2024] Tokenize Anything via Promptingβ602Dec 11, 2024Updated last year
- [ICLR 2025 oral] RMP-SAM: Towards Real-Time Multi-Purpose Segment Anythingβ269Apr 11, 2025Updated 11 months ago
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β257Feb 11, 2025Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β506Aug 9, 2024Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ392Jul 9, 2024Updated last year
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)β186Jul 5, 2024Updated last year
- β1,841Jun 28, 2024Updated last year
- Project Page for "LISA: Reasoning Segmentation via Large Language Model"β2,606Feb 16, 2025Updated last year
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β449Aug 8, 2025Updated 7 months ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,940Aug 15, 2024Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β697Jan 7, 2024Updated 2 years ago
- ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention (ECCV 2024)β82May 20, 2025Updated 10 months ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,655Aug 1, 2024Updated last year
- Official repository for "AM-RADIO: Reduce All Domains Into One"β1,715Feb 11, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- [CVPR 2024 Highlight] Visual Point Cloud Forecastingβ348Jul 2, 2025Updated 8 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,411Aug 4, 2025Updated 7 months ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ553Jun 3, 2025Updated 9 months ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ658Jun 13, 2025Updated 9 months ago
- [NeurIPS24 Spotlight] Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detectionβ155Sep 26, 2024Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,826Nov 27, 2025Updated 4 months ago