Official repository of paper "Subobject-level Image Tokenization" (ICML-25)
☆92Jul 4, 2025Updated 7 months ago
Alternatives and similar repositories for subobjects
Users that are interested in subobjects are comparing it to the libraries listed below
Sorting:
- ☆38Feb 8, 2024Updated 2 years ago
- The official implementation of MutDet (MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection, ECCV 2024).☆24Oct 24, 2024Updated last year
- The extented code of layered conceptual image compression. Journal submitted.☆15Aug 29, 2022Updated 3 years ago
- Code implementation of RP3D-Diag☆17Nov 25, 2024Updated last year
- Open Source Road Datasets☆18Aug 30, 2024Updated last year
- Official implementation of https://arxiv.org/abs/2108.11554 paper☆13Feb 22, 2022Updated 4 years ago
- EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing☆20May 29, 2025Updated 9 months ago
- RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"☆17Aug 24, 2023Updated 2 years ago
- ☆39Jan 3, 2025Updated last year
- Two-way Multi-Label Loss☆35May 12, 2023Updated 2 years ago
- [TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆58Dec 22, 2025Updated 2 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆390Jul 9, 2024Updated last year
- [ICCV'25] When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning☆47Feb 16, 2026Updated last week
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆159Sep 27, 2024Updated last year
- Masked Angle-Aware Autoencoder for Remote Sensing Images (ECCV 2024)☆28Nov 12, 2024Updated last year
- [ICLR 2025] Official Pytorch Implementation of MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segm…☆24Apr 3, 2025Updated 10 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆46Oct 15, 2023Updated 2 years ago
- GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding☆79May 10, 2025Updated 9 months ago
- VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis☆110Feb 19, 2025Updated last year
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆346Apr 20, 2025Updated 10 months ago
- [Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.☆14Jun 7, 2023Updated 2 years ago
- [TGRS'25] AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval☆29Jan 6, 2026Updated last month
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Feb 27, 2025Updated last year
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆106Mar 24, 2025Updated 11 months ago
- "Near, far: Patch-ordering enhances vision foundation models' scene understanding": A New SSL Post-Training Approach for Improving DINOv2…☆29Apr 20, 2025Updated 10 months ago
- Official pytorch implementation of ZiRa, a method for incremental vision language object detection (IVLOD),which has been accepted by Neu…☆28Oct 22, 2024Updated last year
- Switch EMA: A Free Lunch for Better Flatness and Sharpness☆28Feb 16, 2024Updated 2 years ago
- [CVPR'24] PointOBB: Learning Oriented Object Detection via Single Point Supervision☆77Jan 24, 2025Updated last year
- [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want☆867Jul 20, 2025Updated 7 months ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆121Mar 4, 2025Updated 11 months ago
- This is a PyTorch implementation of the paper IDA-SiamNet: Interactive- and Dynamic-Aware Siamese Network for Building Change Detection☆13Aug 21, 2024Updated last year
- Code for Learned Thresholds Token Merging and Pruning for Vision Transformers (LTMP). A technique to reduce the size of Vision Transforme…☆17Nov 24, 2024Updated last year
- ☆13Sep 16, 2022Updated 3 years ago
- TPU에서 한국어용 LLM 추론을 위한 Jax/Flax 구현체입니다.☆12Jun 12, 2023Updated 2 years ago
- MPI Code Generation through Domain-Specific Language Models☆14Nov 19, 2024Updated last year
- Collections of papers and code for employing MLLM for quality assessment tasks.☆13Apr 18, 2024Updated last year
- An up-to-date & curated list of awesome layout to image papers, methods & resources.☆13Jun 28, 2024Updated last year
- ☆12Mar 28, 2022Updated 3 years ago