[ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
β362Jan 14, 2025Updated last year
Alternatives and similar repositories for GiT
Users that are interested in GiT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Birdβs-Eye-View Representation"β353Sep 4, 2024Updated last year
- [CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"β450Sep 4, 2024Updated last year
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"β540Apr 8, 2024Updated 2 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β132Aug 21, 2024Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β959Aug 5, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scaleβ1,172Oct 21, 2024Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenizationβ586Jun 7, 2024Updated 2 years ago
- [CVPR2022] This is the official code of "RBGNet: Ray-based Grouping for 3D Object Detection".β40Jul 17, 2023Updated 2 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"β211Jun 9, 2024Updated 2 years ago
- Official Repo For OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]β1,346Oct 15, 2025Updated 8 months ago
- This is the official code release for our work, Denoising Vision Transformers.β400Nov 13, 2024Updated last year
- VisionLLM Seriesβ1,148Feb 27, 2025Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β2,004Nov 7, 2025Updated 7 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generationβ86Sep 12, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Promptsβ339Jul 17, 2024Updated last year
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".β1,032Aug 4, 2025Updated 10 months ago
- [ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detectionβ13Apr 12, 2024Updated 2 years ago
- [NeurIPS2025 Spotlight π₯ ] Official implementation of πΈ "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Languβ¦β275Nov 5, 2025Updated 7 months ago
- [ECCV 2024] Tokenize Anything via Promptingβ602Dec 11, 2024Updated last year
- [ICLR 2025 oral] RMP-SAM: Towards Real-Time Multi-Purpose Segment Anythingβ269Apr 11, 2025Updated last year
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β509Aug 9, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β269Feb 11, 2025Updated last year
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ392Jul 9, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)β188Jul 5, 2024Updated last year
- β1,835Jun 28, 2024Updated last year
- Project Page for "LISA: Reasoning Segmentation via Large Language Model"β2,648Feb 16, 2025Updated last year
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,954Aug 15, 2024Updated last year
- ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention (ECCV 2024)β80May 20, 2025Updated last year
- EVA Series: Visual Representation Fantasies from BAAIβ2,683Aug 1, 2024Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"β703Jan 7, 2024Updated 2 years ago
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β465Aug 8, 2025Updated 10 months ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"β1,857May 29, 2026Updated 2 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- [CVPR 2024 Highlight] Visual Point Cloud Forecastingβ349Jul 2, 2025Updated 11 months ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ668Jun 13, 2025Updated last year
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ555Jun 3, 2025Updated last year
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.β1,420Aug 4, 2025Updated 10 months ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024β1,839Nov 27, 2025Updated 6 months ago
- Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.β280Oct 26, 2024Updated last year