Yui010206/SeViLA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Yui010206/SeViLA)

Yui010206 / SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

☆198

Alternatives and similar repositories for SeViLA

Users that are interested in SeViLA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

antoyang / FrozenBiLM
View on GitHub
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆159Dec 9, 2024Updated last year
doc-doc / NExT-GQA
View on GitHub
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆89Jul 1, 2024Updated 2 years ago
yl3800 / TranSTR
View on GitHub
☆12Dec 15, 2023Updated 2 years ago
showlab / mist
View on GitHub
☆37Dec 20, 2023Updated 2 years ago
VRU-NExT / VideoQA
View on GitHub
☆104Oct 19, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
j-min / HiREST
View on GitHub
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆110Jan 23, 2025Updated last year
Yui010206 / CREMA
View on GitHub
[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆56Jul 1, 2025Updated last year
doc-doc / CoVGT
View on GitHub
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
☆20Mar 9, 2024Updated 2 years ago
doc-doc / NExT-QA
View on GitHub
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
☆189Aug 2, 2025Updated 11 months ago
RenShuhuai-Andy / TimeChat
View on GitHub
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
☆425May 8, 2025Updated last year
klauscc / VindLU
View on GitHub
☆108Dec 23, 2022Updated 3 years ago
mlvlab / Flipped-VQA
View on GitHub
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆77Mar 26, 2025Updated last year
zjuchenlong / WSAG
View on GitHub
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
☆14Nov 25, 2023Updated 2 years ago
doc-doc / HQGA
View on GitHub
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
☆35Sep 17, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
wenhaochai / MovieChat
View on GitHub
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆704Jan 29, 2025Updated last year
minghangz / cpl
View on GitHub
CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning
☆65Mar 22, 2026Updated 4 months ago
Becomebright / GroundVQA
View on GitHub
Official PyTorch code of GroundVQA (CVPR'24)
☆63Sep 13, 2024Updated last year
Ziyang412 / VideoTree
View on GitHub
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆165Jun 23, 2025Updated last year
CeeZh / LLoVi
View on GitHub
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆106Oct 27, 2024Updated last year
yl3800 / IGV
View on GitHub
This repo contains code for Invariant Grounding for Video Question Answering
☆27Mar 2, 2023Updated 3 years ago
Ziyang412 / UCoFiA
View on GitHub
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆66Jun 7, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
joslefaure / HERMES
View on GitHub
[ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
☆37Sep 10, 2025Updated 10 months ago
Yui010206 / MoPRL
View on GitHub
[TCSVT] Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
☆17Jul 22, 2023Updated 3 years ago
jinhyunj / EaTR
View on GitHub
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆55Sep 7, 2023Updated 2 years ago
jayleicn / singularity
View on GitHub
[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
☆136May 5, 2023Updated 3 years ago
zinengtang / TVLT
View on GitHub
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
☆127Feb 24, 2023Updated 3 years ago
showlab / EgoVLP
View on GitHub
[NeurIPS 2022] Egocentric Video-Language Pretraining
☆261May 9, 2024Updated 2 years ago
imagegridworth / IG-VLM
View on GitHub
☆138Sep 29, 2024Updated last year
StanfordVL / atp-video-language
View on GitHub
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…
☆51May 29, 2024Updated 2 years ago
MikeWangWZHL / Paxion
View on GitHub
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆38May 23, 2023Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
DAMO-NLP-SG / Video-LLaMA
View on GitHub
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
☆3,140Jun 4, 2024Updated 2 years ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
mbzuai-oryx / Video-ChatGPT
View on GitHub
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the cap…
☆1,503Aug 5, 2025Updated 11 months ago
snumprlab / isr-dpo
View on GitHub
Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)
☆23Nov 25, 2025Updated 7 months ago
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆212Dec 18, 2022Updated 3 years ago
EasonXiao-888 / UVCOM
View on GitHub
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
☆117Jul 17, 2024Updated 2 years ago
sail-sg / VGT
View on GitHub
Video Graph Transformer for Video Question Answering (ECCV'22)
☆49Jun 8, 2023Updated 3 years ago