FoundationVision / GLEE
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
☆1,027Updated last month
Related projects: ⓘ
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".☆902Updated last month
- Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"☆904Updated last month
- Project Page for "LISA: Reasoning Segmentation via Large Language Model"☆1,754Updated 2 months ago
- [CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"☆748Updated last month
- API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series☆707Updated last month
- [ECCV 2024] Tokenize Anything via Prompting☆502Updated 2 months ago
- OMG-LLaVA and OMG-Seg codebase☆1,222Updated last month
- [CVPR'23] Universal Instance Perception as Object Discovery and Retrieval☆1,488Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆537Updated 3 months ago
- [GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generatio…☆4,003Updated 2 months ago
- (TPAMI 2024) A Survey on Open Vocabulary Learning☆794Updated 3 weeks ago
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆474Updated 4 months ago
- [ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"☆637Updated 7 months ago
- Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"☆390Updated 5 months ago
- Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds☆1,493Updated last month
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"☆363Updated 5 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆710Updated last week
- An efficient modular implementation of Associating Objects with Transformers for Video Object Segmentation in PyTorch☆601Updated 5 months ago
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything☆2,081Updated 3 months ago
- Collection of AWESOME vision-language models for vision tasks☆2,213Updated 3 weeks ago
- [ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"☆1,334Updated 2 months ago
- Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything☆947Updated this week
- [ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"☆2,255Updated 2 months ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…