Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
☆280Sep 28, 2025Updated 7 months ago
Alternatives and similar repositories for PAM
Users that are interested in PAM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fully local, no dependency scribe. Speak into your microphone and summarize. Requires iOS 26 and MacOS 26 to use the advanced transcripti…☆288Sep 30, 2025Updated 7 months ago
- Crypto & Cross‑Asset Event Study Toolkit — Cross‑Asset Event Study Analysis Repository☆277Jun 22, 2025Updated 11 months ago
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆128Nov 14, 2025Updated 6 months ago
- [NeurIPS 2025] Native-resolution diffusion Transformer☆235Oct 14, 2025Updated 7 months ago
- 这是一个专为开发者打造的高效 Code Review 工具,旨在提升代码审查质量、降低沟通成本,并加速团队协作流程。通过智能比对、注释建议、变更摘要、代码质量提示等功能,帮助开发者更快地理解提交内容、发现潜在问题,让每一次 Review 更加清晰、高效、有价值。☆241Mar 16, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆119Jul 11, 2025Updated 10 months ago
- ⛲Imagination, Given Voice.✨☆700Updated this week
- Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, …☆2,047Updated this week
- A powerful serialization framework for Python objects with automatic type registration and validation. Extract from AgentSmith, released …☆14Mar 2, 2026Updated 2 months ago
- [ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆148Jun 30, 2025Updated 10 months ago
- Sceneform-EQR extends Google’s Sceneform Android SDK, supporting graphics, video, AR, and VR applications. It integrates ARCore, AREngine…☆161May 13, 2026Updated last week
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆213Oct 15, 2025Updated 7 months ago
- [NIPS 2025] Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative …☆59Oct 23, 2025Updated 6 months ago
- Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"☆84Dec 12, 2025Updated 5 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ICCV 2025] LIRA☆22Nov 25, 2025Updated 5 months ago
- 🍞 AI-Powered Interview Assistant - Your Confident Interview Companion | 智能面试助手,让每次面试都充满自信☆50Jan 16, 2026Updated 4 months ago
- 统一消息推送平台:支持推送到邮箱及客户端☆22Jan 10, 2026Updated 4 months ago
- 一个强大的飞书开放平台API集成工具,完整集成FastGPT AI平台,支持飞书知识库全格式自动同步,支持集成飞书机器人(完美支持思考模式、流式输出、引用下载、图片渲染)☆68Dec 21, 2025Updated 5 months ago
- [arXiv'25]🌈 Unseen 3D Geometry Reasoning from a Single Image.☆82Jul 10, 2025Updated 10 months ago
- [CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes☆56Apr 9, 2025Updated last year
- Nexent is a zero-code platform for auto-generating production-grade AI agents using Harness Engineering principles — unified tools, skill…☆4,541Updated this week
- ☆22May 30, 2023Updated 2 years ago
- Official implementation of "MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization"☆17Nov 26, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- MacOS hardware performance monitoring CLI tool with a focus on AI Workloads☆36Jul 7, 2025Updated 10 months ago
- Code for "BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events", ECCV 2024 and…☆20Feb 13, 2025Updated last year
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆170Apr 9, 2026Updated last month
- [CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆178Feb 25, 2026Updated 2 months ago
- https://avocado-captioner.github.io/☆34Oct 16, 2025Updated 7 months ago
- [ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from u…☆210May 5, 2025Updated last year
- (CVPR 26 Findings) Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-…☆35Apr 7, 2026Updated last month
- Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, from Deepmind☆64Updated this week
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,487Jun 26, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated 2 years ago
- [CVPR 2026] The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"☆113Feb 28, 2026Updated 2 months ago
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆42Dec 2, 2025Updated 5 months ago
- Inverse Tiling of 2D Finite Domains (Siggraph Asia 2025)☆32Oct 6, 2025Updated 7 months ago
- Just having comparing hybrid ResNet50+ViT models with pure ResNet18 CNN on a mixed dataset! Wanted to see how these different architectur…☆20Dec 8, 2025Updated 5 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆96Dec 1, 2025Updated 5 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 11 months ago