Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
☆279Sep 28, 2025Updated 7 months ago
Alternatives and similar repositories for PAM
Users that are interested in PAM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Crypto & Cross‑Asset Event Study Toolkit — Cross‑Asset Event Study Analysis Repository☆288Jun 22, 2025Updated 10 months ago
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".☆128Nov 14, 2025Updated 5 months ago
- [NeurIPS 2025] Native-resolution diffusion Transformer☆234Oct 14, 2025Updated 6 months ago
- 这是一个专为开发者打造的高效 Code Review 工具,旨在提升代码审查质量、降低沟通成本,并加速团队协作流程。通过智能比对、注释建议、变更摘要、代码质量提示等功能,帮助开发者更快地理解提交内容、发现潜在问题,让每一次 Review 更加清晰、高效、有价值。☆241Mar 16, 2026Updated last month
- ☆120Jul 11, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ⛲Imagination, Given Voice.✨☆703Updated this week
- A powerful serialization framework for Python objects with automatic type registration and validation. Extract from AgentSmith, released …☆14Mar 2, 2026Updated 2 months ago
- 专为闲鱼平台打造的AI值守解决方案,实现闲鱼平台7×24小时自动化值守,支持多专家协同决策、智能议价和上下文感知对话。☆120Jul 19, 2025Updated 9 months ago
- [ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆148Jun 30, 2025Updated 10 months ago
- Real-time Google Scholar citation tracker in your macOS menu bar.☆105Updated this week
- Sceneform-EQR extends Google’s Sceneform Android SDK, supporting graphics, video, AR, and VR applications. It integrates ARCore, AREngine…☆161Feb 2, 2026Updated 3 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆212Oct 15, 2025Updated 6 months ago
- [NIPS 2025] Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative …☆60Oct 23, 2025Updated 6 months ago
- Official PyTorch implementation of the paper "FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing"☆83Dec 12, 2025Updated 4 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICCV 2025] LIRA☆21Nov 25, 2025Updated 5 months ago
- 🍞 AI-Powered Interview Assistant - Your Confident Interview Companion | 智能面试助手,让每次面试都充满自信☆50Jan 16, 2026Updated 3 months ago
- Voice to prompt, empowering your vibe coding☆95Jul 7, 2025Updated 9 months ago
- 统一消息推送平台:支持推送到邮箱及客户端☆23Jan 10, 2026Updated 3 months ago
- 一个强大的飞书开放平台API集成工具,完整集成FastGPT AI平台,支持飞书知识库全格式自动同步,支持集成飞书机器人(完美支持思考模式、流式输出、引用下载、图片渲染)☆76Dec 21, 2025Updated 4 months ago
- Nexent is a zero-code platform for auto-generating production-grade AI agents using Harness Engineering principles — unified tools, skill…☆4,360Updated this week
- [CVPR 2025] A Unified Image-Dense Annotation Generation Model for Underwater Scenes☆56Apr 9, 2025Updated last year
- ☆22May 30, 2023Updated 2 years ago
- Official implementation of "MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization"☆21Nov 26, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- MacOS hardware performance monitoring CLI tool with a focus on AI Workloads☆33Jul 7, 2025Updated 9 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆57Mar 16, 2026Updated last month
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"☆170Apr 9, 2026Updated 3 weeks ago
- [CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆176Feb 25, 2026Updated 2 months ago
- https://avocado-captioner.github.io/☆33Oct 16, 2025Updated 6 months ago
- [ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from u…☆210May 5, 2025Updated 11 months ago
- (CVPR 26 Findings) Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-…☆34Apr 7, 2026Updated 3 weeks ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,486Jun 26, 2025Updated 10 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆35Mar 27, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [CVPR 2026] The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"☆112Feb 28, 2026Updated 2 months ago
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆40Dec 2, 2025Updated 5 months ago
- Inverse Tiling of 2D Finite Domains (Siggraph Asia 2025)☆38Oct 6, 2025Updated 6 months ago
- Just having comparing hybrid ResNet50+ViT models with pure ResNet18 CNN on a mixed dataset! Wanted to see how these different architectur…☆20Dec 8, 2025Updated 4 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆96Dec 1, 2025Updated 5 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 10 months ago
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆182Oct 15, 2025Updated 6 months ago