TencentARC / ViSFT
☆32Updated 7 months ago
Related projects: ⓘ
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆36Updated last month
- ☆20Updated 9 months ago
- Efficient Multi-modal Models via Stage-wise Visual Context Compression☆34Updated last month
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆45Updated 2 weeks ago
- Wire Removal Video Datasets 2(WRV2)