Automated image & video captioning using Qwen-VL, Gemma4 and SAM3.
☆91Apr 27, 2026Updated 2 months ago
Alternatives and similar repositories for VisionCaptioner
Users that are interested in VisionCaptioner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Uses SAM3 to transfer 2D semantics onto 3D point clouds, producing segmented training data and efficient scene graphs for indoor environm…☆84Jan 23, 2026Updated 5 months ago
- Doom for Gear VR☆19Jun 4, 2019Updated 7 years ago
- [NeurIPS 2025] UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting☆115Apr 8, 2026Updated 2 months ago
- Deploy your own private OpenAI-compatible LLM☆27Jun 5, 2025Updated last year
- ☆14Jun 25, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The node has been created with an objective of identity consistency for FLUX.2 klein 9b models in ComfyUI.☆60May 13, 2026Updated last month
- ComfyUI custom node for generating prompts from images. Supports Qwen2.5 and Qwen3 (Instruct/Thinking) models, as well as the OpenAI API.☆26Jan 10, 2026Updated 5 months ago
- Portable Version Of WanGP. A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Qwen Image, Hunyuan Video, LTX Video and Fl…☆36Jun 24, 2026Updated last week
- A native Application , With Agentic Support (MCP) for ultra fast AI image generation using a highly optimized Z-Image-Turbo model with SD…☆32Feb 3, 2026Updated 5 months ago
- AllJoyn Device Service Bridge for Philips Hue☆11Jan 20, 2016Updated 10 years ago
- Control Virt-A-Mate (NSFW) commands with the keyboard or gamepads☆14Sep 21, 2020Updated 5 years ago
- ComfyUI nodes for Wan 2.2 SVI 2 Pro with Keyframe control via First/Last Frame and seamless video stitching.☆72Apr 22, 2026Updated 2 months ago
- Table top manipulation calibration between the robot arm, the fixed cameras and the camera in hand.☆13Apr 12, 2024Updated 2 years ago
- ☆12Nov 24, 2018Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The Code shows How to Transcribe Audio to text using the fairseq_meta_mms (Google Colab Version)👇☆19May 25, 2023Updated 3 years ago
- Pathtracing☆14Jan 7, 2025Updated last year
- Desktop app with Compose Multiplatform to use Qwen3-TTS with an UI.☆66Updated this week
- Wiki☆11Mar 11, 2020Updated 6 years ago
- Unmaintained - .NET VRPN server and framework for Kinect voice recognition and skeletal tracking☆14Feb 6, 2019Updated 7 years ago
- Official implementation of the benchmarked 2D, 3D classficiation, and 3D semantic segmentation models on PeRFception.☆14Jan 21, 2023Updated 3 years ago
- ☆10Aug 13, 2012Updated 13 years ago
- [CVPR 2026 Highlight] XL-VLA: Cross-Hand Latent Representation for Vision-Language-Action Models☆98Apr 15, 2026Updated 2 months ago
- Tool to automatically generate text descriptions for images using Ollama vision models (LLaVA, Qwen3-VL, Llama Vision)☆36Dec 14, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆23Mar 31, 2025Updated last year
- MIMIC: Masked Image Modeling with Image Correspondences☆16Jun 14, 2024Updated 2 years ago
- ☆12Feb 23, 2024Updated 2 years ago
- ☆19Jan 21, 2025Updated last year
- Gives the option to unload one or all models based on memory needs in your flow.☆27Jun 30, 2024Updated 2 years ago
- Zunda: Japanese Enhanced Modality Analyzer client for Python.☆10Nov 30, 2019Updated 6 years ago
- ☆19Jan 30, 2023Updated 3 years ago
- SteamVR support for the Razer Hydra, and possibly other OSVR supported controllers.☆10Apr 19, 2016Updated 10 years ago
- [ICCV 2025] HouseTour: A Virtual Real Estate A(I)gent☆49Oct 22, 2025Updated 8 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction☆60Jun 21, 2026Updated last week
- Karaoke lyrics plugins for Flutter☆10Apr 30, 2022Updated 4 years ago
- Implementation for DIY-SC paper.☆28Jul 14, 2025Updated 11 months ago
- FlowPath is a free and open source ComfyUI custom node that intelligently organizes your AI-generated images.☆40Feb 5, 2026Updated 4 months ago
- IME for Mac.☆10Jun 18, 2026Updated 2 weeks ago
- Beautiful voice app: record or upload to train a voice, generate speech from text or files, save & download voices.☆65Oct 19, 2025Updated 8 months ago
- ☆10May 5, 2021Updated 5 years ago