Sanster / VLM-demos
Collect VLM models that can be tried online.
☆13Updated last year
Alternatives and similar repositories for VLM-demos:
Users that are interested in VLM-demos are comparing it to the libraries listed below
- A minimalistic, hackable code base to finetune Wan video generation model☆38Updated last week
- ComfyUI wrapper for Moondream's gaze detection☆51Updated 2 months ago
- ☆46Updated last year
- Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models☆15Updated last year
- qwen create prompt for sdxl☆32Updated last year
- Fine-tune of Florence-2 for shot categorization.☆24Updated last month
- ComfyUI node for fast neural style transfer☆71Updated 2 weeks ago
- The inference code of RVC-Boss/GPT-SoVITS that can be developer-friendly.☆15Updated 6 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆11Updated 9 months ago
- ☆32Updated 3 weeks ago
- Diffusers Image Fill v3 -- Inpaint or Remove objects from an image - or Outpaint - or Outpaint Video Zoom: 16GB+ GPU | 32GB+ RAM | 20GB+…☆12Updated 5 months ago
- Just a subfolder of https://github.com/siliconflow/onediff☆21Updated 10 months ago
- ImageSlider custom component for gradio.☆42Updated 11 months ago
- Incredibly descriptive audiovisual summaries for videos☆40Updated 8 months ago
- ☆25Updated 10 months ago
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆160Updated last month
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆22Updated last year
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆43Updated 2 months ago
- ☆26Updated 8 months ago
- In-browser image segmentation via Transformers.js , Service Worker, Nuxt☆24Updated last year
- ☆29Updated last year
- ☆32Updated 3 months ago
- Chrome extension to add a link from each Arxiv page to the corresponding HF Paper page☆25Updated last year
- ☆51Updated last month
- GLM Series Edge Models☆136Updated 2 months ago
- ☆16Updated 9 months ago
- ComfyUI YOLO-World Integration☆41Updated 9 months ago
- XVERSE-MoE-A4.2B: A multilingual large language model developed by XVERSE Technology Inc.☆37Updated 11 months ago
- ☆14Updated 3 months ago
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything☆17Updated last year