Sanster / VLM-demosLinks
Collect VLM models that can be tried online.
☆14Updated last year
Alternatives and similar repositories for VLM-demos
Users that are interested in VLM-demos are comparing it to the libraries listed below
Sorting:
- ☆47Updated last year
- Run Open Source Local AI Models in Excel with Ollama☆24Updated 5 months ago
- [EMNLP 2025 Demo] PresentAgent: Multimodal Agent for Presentation Video Generation☆127Updated 2 months ago
- Auto Thinking Mode switch for Qwen3 in Open webui☆70Updated 8 months ago
- Stream live plots to a matplotlib figure☆80Updated 9 months ago
- ComfyUI YOLO-World Integration☆48Updated last year
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆69Updated last year
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆64Updated 8 months ago
- Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval And Synthesis For SLMs☆52Updated 3 months ago
- Tencent Hunyuan 7B (short as Hunyuan-7B) is one of the large language dense models of Tencent Hunyuan☆71Updated 5 months ago
- React application using Segment Anything in browser☆10Updated 2 years ago
- qwen create prompt for sdxl☆34Updated 2 years ago
- In-browser image segmentation via Transformers.js , Service Worker, Nuxt☆25Updated last year
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated last year
- 集成了LLM与SDXL的AIGC应用程序☆29Updated 2 years ago
- ☆72Updated 2 months ago
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆198Updated last month
- Learning records for building a large language model from scratch☆58Updated last year
- ComfyUI wrapper for Moondream's gaze detection☆56Updated 11 months ago
- Real-time video understanding and interaction through text,audio,image and video with large multi-modal model. 利用多模态大模型的实时视频理解和交互框架,通过文本…☆26Updated 2 years ago
- codewithgpu.com python client package☆20Updated 2 years ago
- ☆16Updated last year
- Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models☆15Updated 2 years ago
- ☆26Updated last year
- XVERSE-MoE-A4.2B: A multilingual large language model developed by XVERSE Technology Inc.☆39Updated last year
- Exploration of World Languages☆19Updated last year
- Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.☆62Updated last year
- ImageSlider custom component for gradio.☆43Updated last year
- ☆49Updated 4 months ago
- ☆24Updated last year