☆17Aug 7, 2024Updated last year
Alternatives and similar repositories for perceptionGPT
Users that are interested in perceptionGPT are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- [AAAI 2025] Does VLM Classification Benefit from LLM Description Semantics?☆25Aug 5, 2025Updated 7 months ago
- ☆13Jul 30, 2024Updated last year
- ☆35Nov 25, 2025Updated 3 months ago
- ALPS: An Auto-Labeling and Pre-training Scheme for Remote Sensing Segmentation With Segment Anything Model☆21Aug 20, 2024Updated last year
- code for Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning☆20Jul 16, 2024Updated last year
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆53Feb 10, 2025Updated last year
- ☆21Aug 27, 2025Updated 6 months ago
- Awesome autoregressive vision foundation models☆26Dec 24, 2024Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆60Aug 23, 2024Updated last year
- The implementation for CIKM 2024: Towards Completeness-Oriented Tool Retrieval for Large Language Models.☆24Nov 6, 2024Updated last year
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.☆30Nov 13, 2025Updated 3 months ago
- ☆43Jul 31, 2025Updated 7 months ago
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆269Dec 30, 2024Updated last year
- ☆28Jul 22, 2024Updated last year
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆79Dec 27, 2025Updated 2 months ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆70Apr 7, 2024Updated last year
- ☆33Sep 27, 2024Updated last year
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆77Jul 13, 2024Updated last year
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation☆163Nov 8, 2025Updated 3 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- MLLMSeg: Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder☆51Aug 16, 2025Updated 6 months ago
- ☆11Dec 23, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- Concurrency library☆17Oct 13, 2024Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆97Apr 14, 2025Updated 10 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- ☆41Dec 10, 2024Updated last year
- Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496☆92Feb 19, 2026Updated last week
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆46Dec 1, 2024Updated last year
- The official codes for "AutoRG-Brain: Grounded Report Generation for Brain MRI".☆49Jan 6, 2026Updated last month
- Code release for "Weakly Supervised Open-Vocabulary Object Detection", AAAI2024☆35Sep 9, 2024Updated last year
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆43Mar 11, 2025Updated 11 months ago
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆29Feb 4, 2026Updated last month
- Develop C++/CUDA extensions with PyTorch like Python scripts☆10Jan 7, 2026Updated last month
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year