(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
β29Sep 27, 2024Updated last year
Alternatives and similar repositories for TOPA
Users that are interested in TOPA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β14Feb 26, 2024Updated 2 years ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β43Feb 10, 2026Updated 4 months ago
- ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioningβ143Mar 16, 2023Updated 3 years ago
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learningβ28Sep 27, 2024Updated last year
- [EMNLPβ24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answeringβ18Oct 9, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β13Apr 13, 2026Updated last month
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]β10Jul 22, 2024Updated last year
- β11Oct 2, 2024Updated last year
- [AAAI 2023] The official implementation of "A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection"β22Jan 24, 2025Updated last year
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal β¦β24Aug 18, 2025Updated 9 months ago
- Implementation for "DeltaPhi: Learning Physical Trajectory Residual for PDE Solving"β13Jun 17, 2024Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β131Apr 4, 2025Updated last year
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!β11May 24, 2023Updated 3 years ago
- text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)β12Oct 15, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [TIP 2023] Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.β13Aug 19, 2023Updated 2 years ago
- [EMNLP 2024] A Video Chat Agent with Temporal Priorβ33Mar 2, 2025Updated last year
- [NeurIPS 2025] Code for BEAST Experiments on CALVIN and LIBERO.β38Jan 8, 2026Updated 5 months ago
- Understanding Self-Supervised Learning in a non-IID Settingβ21Oct 21, 2022Updated 3 years ago
- Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"β29Feb 27, 2026Updated 3 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ40Nov 10, 2024Updated last year
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)β34Aug 12, 2024Updated last year
- π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)β56Jan 31, 2025Updated last year
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β130Jul 27, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ56Mar 9, 2025Updated last year
- [ECCV2022] Rethinking Data Augmentation for Robust Visual Question Answeringβ13Nov 23, 2022Updated 3 years ago
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Visβ¦β25Jul 21, 2024Updated last year
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β57May 25, 2025Updated last year
- Evaluation code for "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation"β18Mar 10, 2024Updated 2 years ago
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Dataβ13Sep 30, 2023Updated 2 years ago
- [ECCV 2022] Learning Instance-Specific Adaptation for Cross-Domain Segmentationβ14Jul 17, 2022Updated 3 years ago
- TTRV: Test-Time Reinforcement Learning for VisionβLanguage Models (CVPR 2026)β43Mar 8, 2026Updated 3 months ago
- [ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"β59Sep 3, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- MR. Video: MapReduce is the Principle for Long Video Understandingβ31Apr 23, 2025Updated last year
- Code for "Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical Space"β24Mar 25, 2026Updated 2 months ago
- [WIP] Code for LangToMoβ21Mar 19, 2026Updated 2 months ago
- A collection on the recent reproduction papers and projects on DeepSeek-R1β31Feb 27, 2025Updated last year
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)β15Jul 9, 2023Updated 2 years ago
- πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)β74Jan 20, 2025Updated last year
- Official code for ''RAG Meets Temporal Graphs: Time-Sensitive Modeling and Retrieval for Evolving Knowledge''.β34Feb 25, 2026Updated 3 months ago