Official Implementation of "Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning"
☆28Dec 16, 2025Updated 5 months ago
Alternatives and similar repositories for pix2cap
Users that are interested in pix2cap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients☆14Jan 22, 2025Updated last year
- [AAAI 2025] Official Implementation of "FOCUS: Towards Universal Foreground Segmentation"☆58Jul 8, 2025Updated 10 months ago
- [NeurIPS-W 2025] Official Implementation of "Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning"☆68Jul 1, 2025Updated 10 months ago
- Problem-Oriented Segmentation and Retrieval EMNLP 2024 Findings☆34Nov 12, 2024Updated last year
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆17May 8, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆18May 18, 2026Updated last week
- ☆25Dec 26, 2024Updated last year
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- ☆31Jan 18, 2026Updated 4 months ago
- The official implementation of Cross-Task Experience Sharing (COPS)☆29Oct 23, 2024Updated last year
- ☆21Jan 17, 2025Updated last year
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆27Feb 25, 2025Updated last year
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆63Mar 16, 2026Updated 2 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆43Mar 2, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking☆30Sep 12, 2024Updated last year
- Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)☆13Mar 8, 2024Updated 2 years ago
- (CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA…☆27Aug 23, 2025Updated 9 months ago
- ☆11Nov 22, 2019Updated 6 years ago
- ☆11Aug 13, 2025Updated 9 months ago
- Breaking the SSL-AL Barrier: A Synergistic Semi-Supervised Active Learning Framework for 3D Object Detection☆13Mar 23, 2025Updated last year
- v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning☆19Oct 6, 2025Updated 7 months ago
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆81Apr 20, 2026Updated last month
- A simple semi-automatic labelling tool for semantic segmention masks using SAM as support.☆15Apr 17, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- SODA: Story Oriented Dense Video Captioning Evaluation Framework☆14May 3, 2024Updated 2 years ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆23Apr 10, 2026Updated last month
- Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework☆14May 31, 2023Updated 2 years ago
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…☆14Jan 16, 2025Updated last year
- [ICCV'25] Method for generating static human-object interactions☆43Oct 28, 2025Updated 6 months ago
- [ICCV2023] DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration☆12Oct 12, 2023Updated 2 years ago
- A simple utility to execute your deep learning scripts when there are enough idle gpus | 一个在有足够的空闲gpu时执行深度学习训练的小工具☆16Mar 22, 2022Updated 4 years ago
- Official pytorch implementation of "Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use"☆20Sep 16, 2025Updated 8 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- https://avocado-captioner.github.io/☆35Oct 16, 2025Updated 7 months ago
- Official pytorch implementation for "Unsupervised Camouflaged Object Detection via Adaptive Pseudo-label Learning and Dynamic Local Refin…☆26Oct 27, 2025Updated 6 months ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆31Feb 6, 2026Updated 3 months ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆45May 19, 2026Updated last week
- ☆42Jul 24, 2024Updated last year
- Python code to implement DeIL, a CLIP based approach for open-world few-shot learning.☆19Nov 4, 2024Updated last year
- ☆25Feb 8, 2025Updated last year