zhangzjn / EMOv2Links
EMOv2: Pushing 5M Vision Model Frontier
☆46Updated 5 months ago
Alternatives and similar repositories for EMOv2
Users that are interested in EMOv2 are comparing it to the libraries listed below
Sorting:
- The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"☆143Updated last week
- ☆77Updated 3 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆51Updated 5 months ago
- Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆58Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 8 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆27Updated 9 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 7 months ago
- ☆52Updated last month
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆56Updated last year
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution☆50Updated 3 months ago
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆140Updated 3 weeks ago
- Scaling Vision Pre-Training to 4K Resolution☆162Updated this week
- Precision Search through Multi-Style Inputs☆70Updated last month
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆90Updated 2 months ago
- ☆32Updated 2 months ago
- [CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception☆55Updated 2 weeks ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated last month
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆58Updated last week
- ☆81Updated 2 months ago
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"☆92Updated 2 weeks ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 7 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆184Updated 4 months ago
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆123Updated 7 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers"☆68Updated 2 months ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆63Updated last year
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆58Updated 7 months ago
- ☆42Updated 3 weeks ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆59Updated 3 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆110Updated 2 months ago
- ☆78Updated 6 months ago