Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
☆306Sep 28, 2025Updated 5 months ago
Alternatives and similar repositories for PAM
Users that are interested in PAM are comparing it to the libraries listed below
Sorting:
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆159Nov 14, 2025Updated 4 months ago
- [NeurIPS 2025] Native-resolution diffusion Transformer☆285Oct 14, 2025Updated 5 months ago
- 这是一个专为开发者打造的高效 Code Review 工具,旨在提升代码审查质量、降低沟通成本,并加速团队协作流程。通过智能比对、注释建议、变更摘要、代码质量提示等功能,帮助开发者更快地理解提交内容、发现潜在问题,让每一次 Review 更加清晰、高效、有价值。☆282Mar 16, 2026Updated last week
- Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, …☆1,689Updated this week
- 专为闲鱼平台打造的AI值守解决方案,实现闲鱼平台7×24小时自动化值守,支持多专家协同决策、智能议价和上下文感知对话。☆145Jul 19, 2025Updated 8 months ago
- Real-time Google Scholar citation tracker in your macOS menu bar.☆141Updated this week
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆146Jun 30, 2025Updated 8 months ago
- Sceneform-EQR extends Google’s Sceneform Android SDK, supporting graphics, video, AR, and VR applications. It integrates ARCore, AREngine…☆199Feb 2, 2026Updated last month
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆211Oct 15, 2025Updated 5 months ago
- [NIPS 2025] Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative …☆71Oct 23, 2025Updated 5 months ago
- Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"☆80Dec 12, 2025Updated 3 months ago
- [ICCV 2025] LIRA☆21Nov 25, 2025Updated 3 months ago
- 🍞 AI-Powered Interview Assistant - Your Confident Interview Companion | 智能面试助手,让每次面试都充满自信☆73Jan 16, 2026Updated 2 months ago
- Voice to prompt, empowering your vibe coding☆122Jul 7, 2025Updated 8 months ago
- 一个强大的飞书开放平台API集成工具,完整集成FastGPT AI平台,支持飞书知识库全格式自动同步,支持集成飞书机器人(完美支持思考模式、流式输出、引用下载、图片渲染)☆108Dec 21, 2025Updated 3 months ago
- [arXiv'25]🌈 Unseen 3D Geometry Reasoning from a Single Image.☆78Jul 10, 2025Updated 8 months ago
- Nexent is a zero-code platform for auto-generating agents — no orchestration, no complex drag-and-drop required. Nexent also offers power…☆4,322Updated this week
- MacOS hardware performance monitoring CLI tool with a focus on AI Workloads☆25Jul 7, 2025Updated 8 months ago
- [CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes☆55Apr 9, 2025Updated 11 months ago
- ☆22May 30, 2023Updated 2 years ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆43Mar 16, 2026Updated last week
- https://avocado-captioner.github.io/☆31Oct 16, 2025Updated 5 months ago
- Code for "BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events", ECCV 2024 and…☆20Feb 13, 2025Updated last year
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆179Feb 27, 2026Updated 3 weeks ago
- [CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆173Feb 25, 2026Updated 3 weeks ago
- [ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from u…☆210May 5, 2025Updated 10 months ago
- (CVPR 26 Findings) Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-…☆34Sep 25, 2025Updated 5 months ago
- [NeurIPS 2025]"DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling"☆97Dec 21, 2025Updated 3 months ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,469Jun 26, 2025Updated 8 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated last year
- [CVPR 2026] The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"☆108Feb 28, 2026Updated 3 weeks ago
- Inverse Tiling of 2D Finite Domains (Siggraph Asia 2025)☆58Oct 6, 2025Updated 5 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆95Dec 1, 2025Updated 3 months ago
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆178Oct 15, 2025Updated 5 months ago
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆42Nov 20, 2025Updated 4 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆135Jun 10, 2025Updated 9 months ago
- [NeurIPS 2025] Streaming 3D Reconstruction with Explicit Spatial Pointer Memory☆181Mar 10, 2026Updated last week
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"☆531Apr 8, 2024Updated last year
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆23Mar 18, 2025Updated last year