wdrink/OmniVid

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wdrink/OmniVid)

wdrink / OmniVid

☆58

Alternatives and similar repositories for OmniVid

Users that are interested in OmniVid are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JPShi12 / VideoLoom
View on GitHub
[ICML 2026] VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
☆27Jul 3, 2026Updated 2 weeks ago
wdrink / OpenTokenizer
View on GitHub
☆21Jan 17, 2025Updated last year
wdrink / ARM
View on GitHub
ARM: An AutoRegressive Large Multimodal Model with Discrete Representations
☆50Jun 10, 2026Updated last month
X2FD / LVIS-INSTRUCT4V
View on GitHub
☆134Dec 22, 2023Updated 2 years ago
adamobeng / schemagen
View on GitHub
Make tool-calling schemas for existing tools
☆14Mar 8, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
joslefaure / HERMES
View on GitHub
[ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
☆37Sep 10, 2025Updated 10 months ago
OpenGVLab / video-mamba-suite
View on GitHub
The suite of modeling video with Mamba
☆295May 14, 2024Updated 2 years ago
LinglingCai0314 / FreeMask
View on GitHub
☆11Jan 18, 2025Updated last year
ShuaiyiHuang / SCorrSAN
View on GitHub
Official Code for ECCV2022: Learning Semantic Correspondence with Sparse Annotations
☆18Aug 22, 2022Updated 3 years ago
buxiangzhiren / VD-IT
View on GitHub
Code for the paper "Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation", ECCV 2024
☆48Sep 28, 2024Updated last year
yoxu515 / VIPOSeg-Benchmark
View on GitHub
The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".
☆12Oct 17, 2023Updated 2 years ago
boheumd / MA-LMM
View on GitHub
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
☆350Jul 19, 2024Updated 2 years ago
yexf308 / MachineLearning
View on GitHub
Machine Learning Course From Scratch
☆13Jul 24, 2024Updated last year
ShareLab-SII / FluxMem
View on GitHub
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
☆73Mar 16, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SkyworkAI / DAQ-VS
View on GitHub
Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]
☆15Jul 11, 2024Updated 2 years ago
wdrink / RepWAM
View on GitHub
Code for RepWAM: World Action Modeling with Representation Visual-Action Tokenizers
☆58Jun 14, 2026Updated last month
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
MinghanLi / BoxVIS
View on GitHub
Code release for "BoxVIS: Video Instance Segmentation with Box Annotation"
☆12Dec 22, 2023Updated 2 years ago
RU-System-Software-and-Security / NIC
View on GitHub
☆12Mar 24, 2023Updated 3 years ago
aspirinone / CATR.github.io
View on GitHub
☆31Mar 1, 2024Updated 2 years ago
dibschat / ProVideLLM
View on GitHub
[ICCV 2025] Streaming VideoLLMs for Real-time Procedural Video Understanding
☆18Oct 26, 2025Updated 8 months ago
yingsen1 / UniMD
View on GitHub
UniMD: Towards Unifying Moment retrieval and temporal action Detection
☆57Jul 5, 2024Updated 2 years ago
zjr2000 / LLMVA-GEBC
View on GitHub
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆29Jan 1, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
antoyang / VidChapters
View on GitHub
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
☆211Nov 13, 2023Updated 2 years ago
MengLcool / SliMM
View on GitHub
☆25Dec 26, 2024Updated last year
Junxi-Chen / PE-MIL
View on GitHub
[CVPR 2024] Official code for paper: Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection.
☆27Aug 19, 2024Updated last year
callsys / TextVR
View on GitHub
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆31Dec 28, 2023Updated 2 years ago
huangb23 / VTimeLLM
View on GitHub
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆295Jun 13, 2024Updated 2 years ago
TencentARC / ST-LLM
View on GitHub
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
☆153Sep 10, 2024Updated last year
md-mohaiminul / VideoRecap
View on GitHub
☆208Jul 12, 2024Updated 2 years ago
OpenGVLab / InternVideo
View on GitHub
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
☆2,339Jul 2, 2026Updated 3 weeks ago
yhy-2000 / MomentSeeker
View on GitHub
☆23Jul 23, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
amitrana001 / DynaMITe
View on GitHub
Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"
☆19Sep 29, 2023Updated 2 years ago
MengLcool / SEGIC
View on GitHub
[ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".
☆27Oct 13, 2024Updated last year
yoxu515 / MITS
View on GitHub
☆21Jul 25, 2024Updated last year
lntzm / HICom
View on GitHub
[CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
☆21Apr 30, 2025Updated last year
Agora-Lab-AI / SRT
View on GitHub
An open-source non-official community implementation of the model from the paper: Surgical Robot Transformer (SRT): Imitation Learning fo…
☆13Updated this week
kay-ck / GCMA
View on GitHub
[ACM MM2023] Code Release of GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos
☆12Mar 29, 2024Updated 2 years ago
ronghanghu / vqa-maskrcnn-benchmark-m4c
View on GitHub
Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…
☆13Jan 30, 2020Updated 6 years ago