VideoX: a collection of video cross-modal models
☆1,061Jun 3, 2024Updated last year
Alternatives and similar repositories for VideoX
Users that are interested in VideoX are comparing it to the libraries listed below
Sorting:
- An optimized re-implementation for 2D-TAN: Learning 2D Temporal Localization Networks for Moment Localization with Natural Language (AAAI…☆128Apr 1, 2023Updated 2 years ago
- Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos☆87Nov 22, 2020Updated 5 years ago
- Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"☆131Jul 5, 2021Updated 4 years ago
- TALL: Temporal Activity Localization via Language Query☆217Mar 15, 2018Updated 7 years ago
- Span-based Localizing Network for Natural Language Video Localization (ACL 2020)☆112Oct 15, 2021Updated 4 years ago
- Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware P…☆59Mar 24, 2023Updated 2 years ago
- This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"☆602Dec 6, 2023Updated 2 years ago
- ☆181Aug 20, 2022Updated 3 years ago
- Weakly Supervised Video Moment Retrieval from Text Queries☆43Jul 20, 2020Updated 5 years ago
- ☆193Oct 22, 2022Updated 3 years ago
- [NeurIPS 2021] Moment-DETR code and QVHighlights dataset☆342Apr 18, 2024Updated last year
- [AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding☆91Nov 16, 2022Updated 3 years ago
- Code for the paper: Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos☆71Sep 7, 2021Updated 4 years ago
- [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…☆723Aug 8, 2023Updated 2 years ago
- An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"☆1,025Apr 12, 2024Updated last year
- Official pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding …☆42Aug 5, 2022Updated 3 years ago
- PyTorch implementation of paper "ARTrack" and "ARTrackV2"☆301Oct 20, 2025Updated 4 months ago
- [CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".☆304Apr 3, 2024Updated last year
- [ECCV 2020] PyTorch code for XML on TVRetrieval dataset - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval☆161May 28, 2024Updated last year
- 前沿论文持续更新--视频时刻定位 or 时域语言定位 or 视频片段检索。☆261Aug 26, 2023Updated 2 years ago
- Dense Regression Network for Video Grounding (CVPR2020)☆53Jan 28, 2021Updated 5 years ago
- [CVPR2023] All in One: Exploring Unified Video-Language Pre-training☆281Mar 25, 2023Updated 2 years ago
- [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers☆193Sep 24, 2023Updated 2 years ago
- ☆36Apr 14, 2021Updated 4 years ago
- [NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training☆1,681Dec 8, 2023Updated 2 years ago
- "Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022☆69Jun 27, 2022Updated 3 years ago
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆127Jul 1, 2023Updated 2 years ago
- source code of our RaNet in EMNLP 2021☆30May 31, 2022Updated 3 years ago
- A Pytorch implemention for some state-of-the-art models for" Temporally Language Grounding in Untrimmed Videos"☆95Sep 21, 2019Updated 6 years ago
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts☆188May 1, 2025Updated 9 months ago
- The source code of the paper: "To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression"☆30Jan 8, 2019Updated 7 years ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆107Jan 28, 2024Updated 2 years ago
- ☆26Aug 4, 2020Updated 5 years ago
- PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.☆7,297Feb 19, 2026Updated last week
- A pytorch-version implementation codes of paper: "BMN: Boundary-Matching Network for Temporal Action Proposal Generation", which is ac…☆299Dec 5, 2021Updated 4 years ago
- This is an official implementation for "Video Swin Transformers".☆1,632Mar 8, 2023Updated 2 years ago
- A curated list of temporal action localization/detection and related area (e.g. temporal action proposal) resources.☆587Sep 22, 2022Updated 3 years ago
- 【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective☆198May 30, 2024Updated last year
- [ECCV2024] Video Foundation Models & Data for Multimodal Understanding☆2,201Dec 15, 2025Updated 2 months ago