FingerRec/OA-Transformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FingerRec/OA-Transformer)

FingerRec / OA-Transformer

[CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》

☆61

Alternatives and similar repositories for OA-Transformer

Users that are interested in OA-Transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

showlab / DemoVLP
View on GitHub
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
☆22Mar 19, 2022Updated 4 years ago
showlab / Region_Learner
View on GitHub
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
☆43Jul 15, 2022Updated 4 years ago
sangminwoo / Explore-And-Match
View on GitHub
Official pytorch implementation of "Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding …
☆42Aug 5, 2022Updated 3 years ago
VALUE-Leaderboard / DataRelease
View on GitHub
Data Release for VALUE Benchmark
☆30Feb 16, 2022Updated 4 years ago
princetonvisualai / MQVR
View on GitHub
☆26Jan 12, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
TencentARC / MCQ
View on GitHub
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
☆141Jul 20, 2022Updated 4 years ago
wenz116 / DRFT
View on GitHub
End-to-end Multi-modal Video Temporal Grounding, NeurIPS 2021
☆18Oct 24, 2021Updated 4 years ago
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
tsujuifu / pytorch_violet
View on GitHub
A PyTorch implementation of VIOLET
☆138Dec 17, 2023Updated 2 years ago
ArrowLuo / CLIP4Clip
View on GitHub
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
☆1,028Apr 12, 2024Updated 2 years ago
roeiherz / ORViT
View on GitHub
Object-Region Video Transformers
☆24Mar 24, 2022Updated 4 years ago
VALUE-Leaderboard / StarterCode
View on GitHub
Starter Code for VALUE benchmark
☆79Aug 23, 2022Updated 3 years ago
airsplay / vimpac
View on GitHub
☆73Jun 3, 2022Updated 4 years ago
JonghwanMun / LGI4temporalgrounding
View on GitHub
Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"
☆132Jul 5, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
m-bain / frozen-in-time
View on GitHub
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
☆377May 19, 2022Updated 4 years ago
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated last year
mengcaopku / LocVTP
View on GitHub
[ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization
☆39Jul 29, 2022Updated 3 years ago
microsoft / LAVENDER
View on GitHub
A Unified Framework for Video-Language Understanding
☆62Jun 17, 2023Updated 3 years ago
MCG-NJU / MMN
View on GitHub
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
☆91Nov 16, 2022Updated 3 years ago
antoyang / just-ask
View on GitHub
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆127Sep 29, 2023Updated 2 years ago
jayleicn / ClipBERT
View on GitHub
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…
☆730Aug 8, 2023Updated 2 years ago
ju-chen / Efficient-Prompt
View on GitHub
☆197Oct 22, 2022Updated 3 years ago
frostinassiky / bsp
View on GitHub
Placeholder for code of BSP.
☆11Aug 13, 2021Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
weijiawu / BOVText-Benchmark
View on GitHub
[NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting
☆71Oct 9, 2023Updated 2 years ago
video-reality-test / video-reality-test
View on GitHub
☆23May 5, 2026Updated 2 months ago
linjieli222 / HERO
View on GitHub
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆235Sep 16, 2021Updated 4 years ago
rowanz / merlot
View on GitHub
MERLOT: Multimodal Neural Script Knowledge Models
☆226Mar 15, 2022Updated 4 years ago
CryhanFang / CLIP2Video
View on GitHub
☆260Dec 10, 2022Updated 3 years ago
antoine77340 / MIL-NCE_HowTo100M
View on GitHub
PyTorch GPU distributed training code for MIL-NCE HowTo100M
☆221Jul 5, 2022Updated 4 years ago
showlab / cosmo
View on GitHub
☆75May 10, 2024Updated 2 years ago
Chuhanxx / Temporal_Query_Networks
View on GitHub
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding
☆64Mar 9, 2022Updated 4 years ago
kodenii / ORES
View on GitHub
ORES: Open-vocabulary Responsible Visual Synthesis
☆14Dec 12, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yytzsy / ABLR_code
View on GitHub
The source code of the paper: "To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression"
☆30Jan 8, 2019Updated 7 years ago
xyzforever / BEVT
View on GitHub
PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
☆161Jul 19, 2022Updated 4 years ago
amazon-science / video-contrastive-learning
View on GitHub
Video Contrastive Learning with Global Context, ICCVW 2021
☆162May 30, 2022Updated 4 years ago
microsoft / UniVL
View on GitHub
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
☆365Jul 25, 2024Updated last year
jy0205 / STCAT
View on GitHub
[NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
☆54Mar 5, 2024Updated 2 years ago
linjieli222 / HERO_Video_Feature_Extractor
View on GitHub
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆118Jun 9, 2021Updated 5 years ago
kevinliang888 / IVR-QA-baselines
View on GitHub
[ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers
☆20Apr 16, 2024Updated 2 years ago