LHBuilder / SA-Segment-Anything
Vision-oriented multimodal AI
☆49Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for SA-Segment-Anything
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 4 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆59Updated 2 weeks ago
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆82Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆76Updated 4 months ago
- ☆29Updated 3 weeks ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆16Updated 3 weeks ago
- ☆72Updated 8 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆130Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆60Updated 2 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆35Updated last year
- ☆20Updated 11 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆32Updated last month
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆54Updated 2 weeks ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆68Updated 2 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 2 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 3 months ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆29Updated 7 months ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆16Updated 2 years ago
- ☆45Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆95Updated last month
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 3 months ago
- ☆33Updated 9 months ago
- Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch☆24Updated this week
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆58Updated 10 months ago