Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
☆282Sep 28, 2025Updated 9 months ago
Alternatives and similar repositories for PAM
Users that are interested in PAM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fully local, no dependency scribe. Speak into your microphone and summarize. Requires iOS 26 and MacOS 26 to use the advanced transcripti…☆299Sep 30, 2025Updated 9 months ago
- Crypto & Cross‑Asset Event Study Toolkit — Cross‑Asset Event Study Analysis Repository☆276Jun 22, 2025Updated last year
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆129Updated this week
- [NeurIPS 2025] Native-resolution diffusion Transformer☆234Oct 14, 2025Updated 8 months ago
- 这是一个专为开发者打造的高效 Code Review 工具,旨在提升代码审查质量、降低沟通成本,并加速团队协作流程。通过智能比对、注释建议、变更摘要、代码质量提示等功能,帮助开发者更快地理解提交内容、发现潜在问题,让每一次 Review 更加清晰、高效、有价值。☆251Mar 16, 2026Updated 3 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆119Jul 11, 2025Updated 11 months ago
- ⛲Imagination, Given Voice.✨☆708Jun 25, 2026Updated last week
- Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, …☆2,359Updated this week
- A powerful serialization framework for Python objects with automatic type registration and validation. Extract from AgentSmith, released …☆14Mar 2, 2026Updated 4 months ago
- [ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆150Jun 30, 2025Updated last year
- Real-time Google Scholar citation tracker in your macOS menu bar.☆92May 31, 2026Updated last month
- Sceneform-EQR extends Google’s Sceneform Android SDK, supporting graphics, video, AR, and VR applications. It integrates ARCore, AREngine…☆161May 24, 2026Updated last month
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆214Oct 15, 2025Updated 8 months ago
- [NIPS 2025] Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative …☆59Oct 23, 2025Updated 8 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"☆87Dec 12, 2025Updated 6 months ago
- [ICCV 2025] LIRA☆22Nov 25, 2025Updated 7 months ago
- 🍞 AI-Powered Interview Assistant - Your Confident Interview Companion | 智能面试助手,让每次面试都充满自信☆53Jan 16, 2026Updated 5 months ago
- Voice to prompt, empowering your vibe coding☆95Jul 7, 2025Updated 11 months ago
- 统一消息推送平台:支持推送到邮箱及客户端☆22Jun 2, 2026Updated 3 weeks ago
- 一个强大的飞书开放平台API集成工具,完整集成FastGPT AI平台,支持飞书知识库全格式自动同步,支持集成飞书机器人(完美支持思考模式、流式输出、引用下载、图片渲染)☆71Dec 21, 2025Updated 6 months ago
- [CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes☆58Apr 9, 2025Updated last year
- ☆22May 30, 2023Updated 3 years ago
- Nexent is a zero-code platform for auto-generating production-grade AI agents using Harness Engineering principles — unified tools, skill…☆5,371Updated this week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- MacOS hardware performance monitoring CLI tool with a focus on AI Workloads☆62Jul 7, 2025Updated 11 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆69Mar 16, 2026Updated 3 months ago
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆175Jun 13, 2026Updated 2 weeks ago
- [CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆178Feb 25, 2026Updated 4 months ago
- [ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from u…☆210May 5, 2025Updated last year
- (CVPR 26 Findings) Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-…☆34Apr 7, 2026Updated 2 months ago
- Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, from Deepmind☆72Jun 20, 2026Updated last week
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,497Jun 26, 2025Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- https://avocado-captioner.github.io/☆37Oct 16, 2025Updated 8 months ago
- [CVPR 2026] The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"☆116Feb 28, 2026Updated 4 months ago
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆46Dec 2, 2025Updated 6 months ago
- Inverse Tiling of 2D Finite Domains (Siggraph Asia 2025)☆32Jun 15, 2026Updated 2 weeks ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆94Dec 1, 2025Updated 7 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated last year
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆184Oct 15, 2025Updated 8 months ago