zhangzjn / EMOv2Links
[T-PAMI 2025] EMOv2: Pushing 5M Vision Model Frontier
☆52Updated 10 months ago
Alternatives and similar repositories for EMOv2
Users that are interested in EMOv2 are comparing it to the libraries listed below
Sorting:
- Scaling Vision Pre-Training to 4K Resolution☆211Updated 3 months ago
- ☆56Updated 7 months ago
- ☆53Updated 6 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆62Updated 4 months ago
- [CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception☆144Updated 5 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆118Updated 2 weeks ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆127Updated 4 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆28Updated last year
- ☆93Updated 8 months ago
- (ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations☆119Updated 2 weeks ago
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆32Updated last year
- we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editi…☆32Updated last year
- Odd-One-Out: Anomaly Detection by Comparing with Neighbors (CVPR25)☆54Updated 11 months ago
- Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"☆141Updated 10 months ago
- This repo is the official implementation of iSeg: An Iterative Refinement-based Framework for Training-free Segmentation.☆39Updated last year
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆172Updated last month
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated last year
- [ICCV2025] Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆94Updated last week
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆95Updated 4 months ago
- Code release for AccDiffusionV2 (TPAMI)☆35Updated 3 weeks ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆94Updated 3 weeks ago
- Pytorch Implementation of "SMITE: Segment Me In TimE" (ICLR 2025)☆212Updated 2 weeks ago
- ☆20Updated 2 years ago
- FaceXBench: Evaluating Multimodal LLMs on Face Understanding☆18Updated 9 months ago
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"☆115Updated last month
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆46Updated 10 months ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆70Updated last year
- LiVOS: Light Video Object Segmentation with Gated Linear Matching (CVPR 2025)☆41Updated 2 months ago
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆128Updated last year
- [CVPR'2025] EntitySAM: Segment Everything in Video☆54Updated 4 months ago