chenxwh / Grounded-Segment-Anything
Marrying Grounding DINO with Segment Anything & Stable Diffusion & Tag2Text & BLIP & Whisper & ChatBot - Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs
☆13Updated last year
Alternatives and similar repositories for Grounded-Segment-Anything:
Users that are interested in Grounded-Segment-Anything are comparing it to the libraries listed below
- A PoC to run Segment Anything Model (SAM) entirely in the browser without any backend☆65Updated last year
- Implementation of Grounding DINO & Segment Anything, and it allows masking based on prompt, which is useful for programmed inpainting.☆35Updated last year
- [ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces☆234Updated last year
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆107Updated 4 months ago
- ☆82Updated last year
- An attempt at a SVD inpainting pipeline☆51Updated last year
- Incredibly descriptive audiovisual summaries for videos☆40Updated 5 months ago
- Grounded Segment Anything: From Objects to Parts☆393Updated last year
- Demonstration of MobileSAM in the browser enabled through ONNX runtime web☆97Updated last year
- Official Repo of Graphist☆107Updated 9 months ago
- ☆170Updated 6 months ago
- ☆29Updated last year
- (CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation☆27Updated 7 months ago
- A multi-modal AI Model that can generate high quality novel videos with text, images, or video clips.☆65Updated last year
- Video shot transition detection☆21Updated last year
- Retrieval-Augmented Video Generation for Telling a Story☆252Updated 11 months ago
- ☆64Updated 4 months ago
- SDXL LCM Multi-controlnet with loras☆15Updated last year
- Cog wrapper for MagicAnimate☆30Updated last year
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding☆251Updated 5 months ago
- Marrying Grounding DINO with Segment Anything & Stable Diffusion & BLIP - Automatically Detect , Segment and Generate Anything with Image…☆20Updated last year
- [Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos☆278Updated 4 months ago
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆146Updated 2 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆47Updated last year
- Codebase for the Recognize Anything Model (RAM)☆71Updated last year
- Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"☆340Updated 2 weeks ago
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models☆271Updated 9 months ago
- ☆29Updated 11 months ago
- 2nd place solution for the Generative Interior Design 2024 competition☆97Updated last month
- VimTS: A Unified Video and Image Text Spotter☆75Updated 2 months ago