hustvl / MaTVLM
☆33Updated last week
Alternatives and similar repositories for MaTVLM:
Users that are interested in MaTVLM are comparing it to the libraries listed below
- ☆54Updated 2 weeks ago
- ☆10Updated 4 months ago
- ☆16Updated last year
- The first decoder-only multimodal state space model☆80Updated last week
- ☆27Updated 2 months ago
- [IJCV 2024]☆14Updated 4 months ago
- MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning☆62Updated last year
- ☆46Updated 4 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆53Updated 11 months ago
- Autoregressive Image Generation with Randomized Parallel Decoding☆35Updated this week
- ☆12Updated 3 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆73Updated 4 months ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆105Updated 2 weeks ago
- [CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding☆123Updated last week
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆45Updated 3 months ago
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"☆81Updated last month
- LEO: A powerful Hybrid Multimodal LLM☆15Updated 2 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆70Updated last month
- Scaling Vision Pre-Training to 4K Resolution☆93Updated last week
- Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆46Updated 3 weeks ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated last month
- [NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".☆35Updated 2 months ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆49Updated 2 months ago
- The official implementation of the paper "ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations".☆35Updated 2 months ago
- ☆56Updated last week
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆122Updated 7 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆35Updated 9 months ago
- EMOv2: Pushing 5M Vision Model Frontier☆45Updated 3 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆26Updated last month
- [ACM MM 2024] WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition☆48Updated 6 months ago