zhangzjn / EMOv2
EMOv2: Pushing 5M Vision Model Frontier
☆46Updated 4 months ago
Alternatives and similar repositories for EMOv2
Users that are interested in EMOv2 are comparing it to the libraries listed below
Sorting:
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆27Updated 8 months ago
- ☆72Updated 2 months ago
- ☆40Updated this week
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 4 months ago
- Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆59Updated 2 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 9 months ago
- ☆52Updated 3 weeks ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 2 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆161Updated last month
- New generation of CLIP with fine grained discrimination capability, ICML2025☆89Updated this week
- ☆74Updated 6 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆89Updated last month
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆121Updated last week
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"☆88Updated 2 months ago
- CVPR 2025 Workshop on CVEU.☆39Updated last month
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆185Updated 3 months ago
- [NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".☆39Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- Scaling Vision Pre-Training to 4K Resolution☆157Updated 2 weeks ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆31Updated 6 months ago
- The repository contains the official implementation of "Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation"☆40Updated 2 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆105Updated last month
- ☆44Updated 4 months ago
- [NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"☆69Updated 7 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆125Updated 6 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 6 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆42Updated 4 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆72Updated 3 months ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆60Updated 4 months ago
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆92Updated last month