Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
☆302Sep 28, 2025Updated 6 months ago
Alternatives and similar repositories for PAM
Users that are interested in PAM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fully local, no dependency scribe. Speak into your microphone and summarize. Requires iOS 26 and MacOS 26 to use the advanced transcripti…☆318Sep 30, 2025Updated 6 months ago
- Crypto & Cross‑Asset Event Study Toolkit — Cross‑Asset Event Study Analysis Repository☆351Jun 22, 2025Updated 9 months ago
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆155Nov 14, 2025Updated 4 months ago
- [NeurIPS 2025] Native-resolution diffusion Transformer☆251Oct 14, 2025Updated 5 months ago
- ☆131Jul 11, 2025Updated 9 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ⛲Imagination, Given Voice.✨☆774Updated this week
- Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, …☆1,836Updated this week
- [ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆146Jun 30, 2025Updated 9 months ago
- Real-time Google Scholar citation tracker in your macOS menu bar.☆138Apr 3, 2026Updated last week
- Sceneform-EQR extends Google’s Sceneform Android SDK, supporting graphics, video, AR, and VR applications. It integrates ARCore, AREngine…☆177Feb 2, 2026Updated 2 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆212Oct 15, 2025Updated 5 months ago
- [NIPS 2025] Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative …☆65Oct 23, 2025Updated 5 months ago
- Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"☆80Dec 12, 2025Updated 4 months ago
- [ICCV 2025] LIRA☆21Nov 25, 2025Updated 4 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 🍞 AI-Powered Interview Assistant - Your Confident Interview Companion | 智能面试助手,让每次面试都充满自信☆61Jan 16, 2026Updated 2 months ago
- Voice to prompt, empowering your vibe coding☆117Jul 7, 2025Updated 9 months ago
- 统一消息推送平台:支持推送到邮箱及客户端☆24Jan 10, 2026Updated 3 months ago
- 一个强大的飞书开放平台API集成工具,完整集成FastGPT AI平台,支持飞书知识库全格式自动同步,支持集成飞书机器人(完美支持思考模式、流式输出、引用下载、图片渲染)☆84Dec 21, 2025Updated 3 months ago
- [arXiv'25]🌈 Unseen 3D Geometry Reasoning from a Single Image.☆80Jul 10, 2025Updated 9 months ago
- Nexent is a zero-code platform for auto-generating production-grade AI agents using Harness Engineering principles — unified tools, skill…☆4,432Updated this week
- [CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes☆55Apr 9, 2025Updated last year
- MacOS hardware performance monitoring CLI tool with a focus on AI Workloads☆29Jul 7, 2025Updated 9 months ago
- Official implementation of "MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization"☆26Nov 26, 2025Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆49Mar 16, 2026Updated 3 weeks ago
- Code for "BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events", ECCV 2024 and…☆20Feb 13, 2025Updated last year
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆175Feb 27, 2026Updated last month
- [CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆175Feb 25, 2026Updated last month
- [ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from u…☆210May 5, 2025Updated 11 months ago
- (CVPR 26 Findings) Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-…☆34Sep 25, 2025Updated 6 months ago
- [NeurIPS 2025]"DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling"☆99Dec 21, 2025Updated 3 months ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,477Jun 26, 2025Updated 9 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [CVPR 2026] The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"☆110Feb 28, 2026Updated last month
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆40Dec 2, 2025Updated 4 months ago
- Inverse Tiling of 2D Finite Domains (Siggraph Asia 2025)☆56Oct 6, 2025Updated 6 months ago
- [ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models☆45Nov 20, 2025Updated 4 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆96Dec 1, 2025Updated 4 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 10 months ago
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆180Oct 15, 2025Updated 5 months ago