hkust-vgd / MarineGPT
The official implementation of MarineGPT
☆24Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for MarineGPT
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 5 months ago
- The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"☆16Updated 4 months ago
- An official Pytorch implementation of "Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers", CVPR 2023.☆40Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"☆26Updated 8 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated last year
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- ☆15Updated 2 years ago
- Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆42Updated last year
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- ☆29Updated last year
- Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)☆32Updated last year
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆23Updated 9 months ago
- ☆45Updated last year
- Official implement of MIA-DPO☆40Updated 2 weeks ago
- ☆30Updated this week
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆24Updated last month
- ☆36Updated last month
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆24Updated last month
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?☆18Updated 2 months ago
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆16Updated 7 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆47Updated 10 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated last week
- A visual LLM for image region description or QA.☆15Updated last year
- Pixel Propagation for unsupervised visual representation learning☆10Updated 3 years ago
- Code for IterInpaint model, presented in Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation (CVPR 2024 work…☆23Updated 4 months ago
- ☆19Updated 11 months ago