Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
☆304Sep 28, 2025Updated 5 months ago
Alternatives and similar repositories for PAM
Users that are interested in PAM are comparing it to the libraries listed below
Sorting:
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆142Jun 30, 2025Updated 8 months ago
- Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"☆76Dec 12, 2025Updated 2 months ago
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆23Mar 18, 2025Updated 11 months ago
- rmp data ranking☆13Nov 4, 2025Updated 3 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆211Oct 15, 2025Updated 4 months ago
- Segment This Thing is an efficient image segmentation models that uses a biologically-inspired foveated tokenization to reduce inference …☆55Jun 16, 2025Updated 8 months ago
- Includes the VideoCount dataset and CountVid code for the paper Open-World Object Counting in Videos.☆89Dec 15, 2025Updated 2 months ago
- [CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes☆54Apr 9, 2025Updated 10 months ago
- [NeurIPS 2025] Streaming 3D Reconstruction with Explicit Spatial Pointer Memory☆179Sep 26, 2025Updated 5 months ago
- [arXiv'25]🌈 Unseen 3D Geometry Reasoning from a Single Image.☆74Jul 10, 2025Updated 7 months ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆34Sep 25, 2025Updated 5 months ago
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆33Dec 2, 2025Updated 2 months ago
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆68Jul 1, 2025Updated 7 months ago
- [CVPR 2025] Code for "Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering".☆20Jun 16, 2025Updated 8 months ago
- [NeurIPS '25] FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed☆25Jul 26, 2025Updated 7 months ago
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆30Dec 22, 2025Updated 2 months ago
- [NIPS 2025] Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative …☆71Oct 23, 2025Updated 4 months ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,449Jun 26, 2025Updated 8 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆135Jun 10, 2025Updated 8 months ago
- [CVPR 2026] The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"☆96Feb 21, 2026Updated last week
- [CVPR2025] Feat2GS: Probing Visual Foundation Models with Gaussian Splatting☆230Jul 25, 2025Updated 7 months ago
- [CVPR 2025] ScaleLSD: Scalable Deep Line Segment Detection Streamlined☆45Sep 25, 2025Updated 5 months ago
- Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learning☆32Dec 7, 2023Updated 2 years ago
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"☆529Apr 8, 2024Updated last year
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆39Nov 20, 2025Updated 3 months ago
- ☆19Aug 7, 2025Updated 6 months ago
- PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer (CVPR 2023)☆49Oct 10, 2023Updated 2 years ago
- [ICCV 2025] LIRA☆21Nov 25, 2025Updated 3 months ago
- ☆131Jun 24, 2025Updated 8 months ago
- [ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from u…☆210May 5, 2025Updated 9 months ago
- A novel 4D reconstruction method that directly generates high-quality, animation-ready 4D mesh asset (.GLB file) from a single monocular …☆117Nov 24, 2025Updated 3 months ago
- ☆34Nov 4, 2025Updated 3 months ago
- ☆319Jan 24, 2026Updated last month
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Pytorch Implementation of "SMITE: Segment Me In TimE" (ICLR 2025)☆212Nov 12, 2025Updated 3 months ago
- Nano Banana Studio: AI-Powered Marketing Asset Creator with Real-Time Brand Enhancement☆39Sep 10, 2025Updated 5 months ago
- Official implementation of "LoFA: Learning to Predict Personalized Prior for Fast Adaptation of Visual Generative Models".☆33Feb 1, 2026Updated 3 weeks ago
- Offical code for: PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation☆16Dec 10, 2024Updated last year
- miemienet is a C++ AI deep learning inference framework.Supports PPYOLOE、PICODET.☆12Nov 4, 2022Updated 3 years ago