dhg-wei/TOPA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dhg-wei/TOPA)

dhg-wei / TOPA

(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment

☆29

Alternatives and similar repositories for TOPA

Users that are interested in TOPA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MGitHubL / TMac
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
dhg-wei / MCL
View on GitHub
(ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
☆28Sep 27, 2024Updated last year
dhg-wei / DeCap
View on GitHub
ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning
☆144Mar 16, 2023Updated 3 years ago
Vinoground / Vinoground
View on GitHub
☆13Apr 13, 2026Updated 3 months ago
zhiyuanhubj / Long_form_VideoQA
View on GitHub
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆18Oct 9, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
WHB139426 / GCG
View on GitHub
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
☆10Jul 22, 2024Updated 2 years ago
jongwoopark7978 / LVNet
View on GitHub
[Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.
☆44Feb 10, 2026Updated 5 months ago
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
WangWenhao0716 / ASL
View on GitHub
[AAAI 2023] The official implementation of "A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection"
☆22Jan 24, 2025Updated last year
yuexihang / DeltaPhi
View on GitHub
Implementation for "DeltaPhi: Learning Physical Trajectory Residual for PDE Solving"
☆13Jun 17, 2024Updated 2 years ago
iOPENCap / awesome-unimodal-training
View on GitHub
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
☆12Oct 15, 2024Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
VamosC / CoLearning-meet-StitchUp
View on GitHub
[TIP 2023] Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.
☆13Aug 19, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
leonnnop / VAR
View on GitHub
[CVPR 2022] Visual Abductive Reasoning
☆124Oct 22, 2024Updated last year
zzhhfut / CCNet-AAAI2025
View on GitHub
This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …
☆24Aug 18, 2025Updated 11 months ago
cg1177 / Recursive-Multimodal-Agent
View on GitHub
☆19Jul 1, 2026Updated 3 weeks ago
liruiw / Dec-SSL
View on GitHub
Understanding Self-Supervised Learning in a non-IID Setting
☆21Oct 21, 2022Updated 3 years ago
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆133Jul 27, 2024Updated last year
ItemZheng / KDDAug
View on GitHub
[ECCV2022] Rethinking Data Augmentation for Robust Visual Question Answering
☆13Nov 23, 2022Updated 3 years ago
yl3800 / TranSTR
View on GitHub
☆12Dec 15, 2023Updated 2 years ago
TAU-VAILab / hierarcaps
View on GitHub
Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)
☆34Aug 12, 2024Updated last year
kahnchana / mvu
View on GitHub
🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)
☆58Jan 31, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
yuangan / evaluation_eat
View on GitHub
Evaluation code for "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation"
☆19Mar 10, 2024Updated 2 years ago
WangWenhao0716 / AnyPattern
View on GitHub
[IJCV 2025] The official implementation of "AnyPattern: Towards In-context Image Copy Detection"
☆11Oct 24, 2025Updated 8 months ago
Yuliang-Zou / InstCal-Pano
View on GitHub
[ECCV 2022] Learning Instance-Specific Adaptation for Cross-Domain Segmentation
☆14Jul 17, 2022Updated 4 years ago
kahnchana / LangToMo
View on GitHub
[WIP] Code for LangToMo
☆21Mar 19, 2026Updated 4 months ago
mu-cai / TemporalBench
View on GitHub
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆40Nov 10, 2024Updated last year
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
yongliang-wu / MM-VID
View on GitHub
Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".
☆44Jan 4, 2026Updated 6 months ago
aktsonthalia / starlight
View on GitHub
Source code for the paper "Do Deep Neural Network Solutions form a Star Domain?"
☆12May 26, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
showlab / mist
View on GitHub
☆37Dec 20, 2023Updated 2 years ago
kdariina / CLIP-not-BoW-unimodally
View on GitHub
Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"
☆29Feb 27, 2026Updated 4 months ago
ChenyuHeidiZhang / VL-commonsense
View on GitHub
☆14May 23, 2022Updated 4 years ago
z-x-yang / DoraemonGPT
View on GitHub
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆91Jun 19, 2026Updated last month
Ziyang412 / VideoTree
View on GitHub
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆165Jun 23, 2025Updated last year
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
lianshiwei / datavisualization.github.io
View on GitHub
中国历年GDP和人口数据可视化
☆13Jan 18, 2023Updated 3 years ago