zhouyiks/CoLVA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhouyiks/CoLVA)

zhouyiks / CoLVA

☆44

Alternatives and similar repositories for CoLVA

Users that are interested in CoLVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xushilin1 / dst-det
View on GitHub
[TCSVT] state-of-the-art open vocabulary detector on COCO/LVIS/V3Det
☆35Jun 3, 2025Updated last year
yayafengzi / ALToLLM
View on GitHub
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
☆30May 27, 2025Updated last year
SkyworkAI / DAQ-VS
View on GitHub
Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]
☆15Jul 11, 2024Updated 2 years ago
path2generalist / General-Level
View on GitHub
On Path to Multimodal Generalist: General-Level and General-Bench
☆21Jul 11, 2025Updated last year
marinero4972 / CyberV
View on GitHub
☆20Jun 10, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
Haochen-Wang409 / TreeVGR
View on GitHub
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
☆91Jan 26, 2026Updated 6 months ago
ByteDance-Seed / SAIL
View on GitHub
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆85Oct 29, 2025Updated 9 months ago
Tencent / HaploVLM
View on GitHub
ICML2025
☆63Aug 28, 2025Updated 11 months ago
QingZhong1996 / Awesome-Video-Instance-Segmentation-Papers
View on GitHub
☆36Oct 21, 2022Updated 3 years ago
aquastripe / DenseCLIP
View on GitHub
An unofficial implementation for paper "DenseCLIP: Extract Free Dense Labels from CLIP"
☆24Jan 27, 2022Updated 4 years ago
zhang-tao-whu / DVIS_Plus
View on GitHub
☆141Jul 4, 2024Updated 2 years ago
ShareLab-SII / CaTok
View on GitHub
[CVPR-26] Official repository of "CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization"
☆19Mar 9, 2026Updated 4 months ago
KAIST-Visual-AI-Group / APC-VLM
View on GitHub
[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
☆66Sep 12, 2025Updated 10 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
akhilkedia / TranformersGetStable
View on GitHub
[ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"
☆11Jul 19, 2024Updated 2 years ago
bytedance / Sa2VA
View on GitHub
Official Repo For Pixel-LLM Codebase: Sa2VA (PAMI-26), SAMTok (CVPR-26), VRT (Arxiv-25), SaSaSa2VA (1-st solution for LSVOS)
☆1,650Updated this week
xushilin1 / RMP-SAM
View on GitHub
[ICLR 2025 oral] RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
☆271Apr 11, 2025Updated last year
TIGER-AI-Lab / OmniEdit
View on GitHub
Official Repo for Paper "OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision" [ICLR2025]
☆144Jan 27, 2025Updated last year
MajorDavidZhang / Generalization_unified_VLM
View on GitHub
☆24May 23, 2025Updated last year
Haochen-Wang409 / Grasp-Any-Region
View on GitHub
[ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
☆99Jan 26, 2026Updated 6 months ago
scofield7419 / MUIE-REAMO
View on GitHub
Code of the Grounded MUIE model, REAMO
☆11Dec 3, 2024Updated last year
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆27Jun 4, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
godx-7 / DeH4R
View on GitHub
The official repo of the paper titled DeH4R: A Decoupled and Hybrid Method for Road Network Graph Extraction.
☆23May 25, 2026Updated 2 months ago
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
NVlabs / SpaceTools-Toolshed
View on GitHub
☆16Mar 24, 2026Updated 4 months ago
ChocoWu / SeTok
View on GitHub
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆81Apr 19, 2025Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
LilyDaytoy / OpenPVSG
View on GitHub
Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23
☆104Apr 30, 2024Updated 2 years ago
ys-zong / MIRB
View on GitHub
Benchmarking Multi-Image Understanding in Vision and Language Models
☆11Jul 29, 2024Updated last year
zhang-tao-whu / P2PFormer
View on GitHub
☆32Dec 3, 2024Updated last year
lxtGH / Tube-Link
View on GitHub
[ICCV-2023]-Universal Video Segmentaion For VSS, VPS and VIS
☆109Mar 18, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
snap-research / VIMI
View on GitHub
☆13Jul 10, 2024Updated 2 years ago
XenoZLH / Shuffle-R1
View on GitHub
Official code repository of Shuffle-R1
☆26Feb 23, 2026Updated 5 months ago
jihaonew / MM-Instruct
View on GitHub
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Jul 1, 2024Updated 2 years ago
danberlyne / yt-playlist-generator
View on GitHub
Generates a YouTube playlist from a list of URLs.
☆10Aug 14, 2023Updated 2 years ago
kongdai123 / consistency2
View on GitHub
☆16Jun 14, 2024Updated 2 years ago
jinxiang-liu / UFE-AVS
View on GitHub
Official code for CVPR 2024 paper, "Audio-Visual Segmentation via Unlabeled Frame Exploitation""
☆19Jul 7, 2024Updated 2 years ago
lxtGH / DenseWorld-1M
View on GitHub
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
☆129Oct 2, 2025Updated 9 months ago