MikeWangWZHL/Paxion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MikeWangWZHL/Paxion)

MikeWangWZHL / Paxion

Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight

☆38

Alternatives and similar repositories for Paxion

Users that are interested in Paxion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
soCzech / LookForTheChange
View on GitHub
Code for Look for the Change paper published at CVPR 2022
☆36Oct 26, 2022Updated 3 years ago
dmoltisanti / air-cvpr23
View on GitHub
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13May 25, 2023Updated 3 years ago
gicheonkang / sglkt-visdial
View on GitHub
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"
☆13Feb 1, 2023Updated 3 years ago
CPF-NLPR / ULGN4DocEFI
View on GitHub
☆10Nov 14, 2021Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
zinengtang / Perceiver_VL
View on GitHub
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
☆34Feb 5, 2023Updated 3 years ago
wwwfan628 / DA-AIM
View on GitHub
DA-AIM: Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection
☆12Oct 6, 2022Updated 3 years ago
kdariina / CLIP-not-BoW-unimodally
View on GitHub
Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"
☆29Feb 27, 2026Updated 5 months ago
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
Nmegha2601 / anticipatr
View on GitHub
☆12Apr 6, 2023Updated 3 years ago
Hritikbansal / videocon
View on GitHub
☆58Apr 24, 2024Updated 2 years ago
zihuixue / seeAoT
View on GitHub
Code and data release for the paper "Seeing the Arrow of Time in Large Multimodal Models"
☆16Oct 2, 2025Updated 9 months ago
StanfordVL / atp-video-language
View on GitHub
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…
☆51May 29, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wlin-at / ViTTA
View on GitHub
Video Test-Time Adaptation for Action Recognition (CVPR 2023)
☆53Oct 13, 2024Updated last year
zihuixue / MKE
View on GitHub
[ICCV 2021] Multimodal Knowledge Expansion
☆10Aug 28, 2021Updated 4 years ago
Yui010206 / SeViLA
View on GitHub
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆197Jan 14, 2024Updated 2 years ago
NVlabs / PALAVRA
View on GitHub
☆54Jul 31, 2022Updated 3 years ago
shuheikurita / RefEgo
View on GitHub
☆13Jul 20, 2024Updated 2 years ago
florianHofherr / PhysParamInference
View on GitHub
☆19Jan 30, 2023Updated 3 years ago
rajnish-aggarwal / Emotion-recognition-using-audio-and-video-on-RAVDES-dataset
View on GitHub
☆12May 19, 2019Updated 7 years ago
ZuyiZhou / Awesome-Interpretable-Cross-modal-Reasoning
View on GitHub
A Survey on Interpretable Cross-modal Reasoning
☆15Oct 12, 2023Updated 2 years ago
bpiyush / TestOfTime
View on GitHub
Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time
☆46Jun 11, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
facebookresearch / VidOSC
View on GitHub
Code and data release for the paper "Learning Object State Changes in Videos: An Open-World Perspective" (CVPR 2024)
☆37Sep 9, 2024Updated last year
ilkerkesen / ViLMA
View on GitHub
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
☆16Jan 18, 2024Updated 2 years ago
mshukor / ima-lmms
View on GitHub
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆23Oct 15, 2024Updated last year
Vinoground / Vinoground
View on GitHub
☆13Apr 13, 2026Updated 3 months ago
lscpku / VITATECS
View on GitHub
☆18Jul 10, 2024Updated 2 years ago
danielchyeh / this-is-my
View on GitHub
Official This-Is-My Dataset published in CVPR 2023
☆16Jul 18, 2024Updated 2 years ago
ninatu / howtocaption
View on GitHub
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
☆59Aug 19, 2025Updated 11 months ago
TencentARC / TVTS
View on GitHub
Turning to Video for Transcript Sorting
☆49Aug 27, 2023Updated 2 years ago
OpenGVLab / efficient-video-recognition
View on GitHub
☆184Aug 20, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fmthoker / SEVERE-BENCHMARK
View on GitHub
☆26Aug 31, 2023Updated 2 years ago
agneet42 / revision
View on GitHub
[ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"
☆14Aug 6, 2024Updated last year
Yuliang-Zou / InstCal-Pano
View on GitHub
[ECCV 2022] Learning Instance-Specific Adaptation for Cross-Domain Segmentation
☆14Jul 17, 2022Updated 4 years ago
PaulLerner / ViQuAE
View on GitHub
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…
☆39Dec 19, 2024Updated last year
ethanlshen / HierNet
View on GitHub
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆23Nov 8, 2023Updated 2 years ago
jayleicn / singularity
View on GitHub
[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
☆136May 5, 2023Updated 3 years ago
tandav / pitch-detectors
View on GitHub
collection of pitch (f0, fundamental frequency) detection algorithms with unified interface
☆25Nov 25, 2024Updated last year