xyzforever/BEVT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xyzforever/BEVT)

xyzforever / BEVT

PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529

☆161

Alternatives and similar repositories for BEVT

Users that are interested in BEVT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SwinTransformer / Video-Swin-Transformer
View on GitHub
This is an official implementation for "Video Swin Transformers".
☆1,667Mar 8, 2023Updated 3 years ago
ruiwang2021 / mvd
View on GitHub
[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv…
☆135May 21, 2023Updated 3 years ago
TencentARC / MCQ
View on GitHub
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
☆141Jul 20, 2022Updated 4 years ago
MCG-NJU / VideoMAE
View on GitHub
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
☆1,775Dec 8, 2023Updated 2 years ago
mengcaopku / LocVTP
View on GitHub
[ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization
☆39Jul 29, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
FingerRec / OA-Transformer
View on GitHub
[CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》
☆61May 25, 2022Updated 4 years ago
microsoft / LAVENDER
View on GitHub
A Unified Framework for Video-Language Understanding
☆62Jun 17, 2023Updated 3 years ago
tsujuifu / pytorch_violet
View on GitHub
A PyTorch implementation of VIOLET
☆138Dec 17, 2023Updated 2 years ago
facebookresearch / TimeSformer
View on GitHub
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
☆1,864Apr 9, 2024Updated 2 years ago
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
wenz116 / DRFT
View on GitHub
End-to-end Multi-modal Video Temporal Grounding, NeurIPS 2021
☆18Oct 24, 2021Updated 4 years ago
zdou0830 / METER
View on GitHub
METER: A Multimodal End-to-end TransformER Framework
☆377Nov 16, 2022Updated 3 years ago
JaywongWang / CBP
View on GitHub
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware P…
☆59Mar 24, 2023Updated 3 years ago
jshi31 / NAFAE
View on GitHub
Implementation of paper "Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Los…
☆30Jun 29, 2020Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
IBM / sifar-pytorch
View on GitHub
super image for action recognition
☆57Mar 8, 2022Updated 4 years ago
microsoft / SimMIM
View on GitHub
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
☆1,047Sep 29, 2022Updated 3 years ago
MCG-NJU / MMN
View on GitHub
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
☆91Nov 16, 2022Updated 3 years ago
showlab / DemoVLP
View on GitHub
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
☆22Mar 19, 2022Updated 4 years ago
kahnchana / svt
View on GitHub
Official repository for "Self-Supervised Video Transformer" (CVPR'22)
☆109Jun 26, 2024Updated 2 years ago
linjieli222 / HERO
View on GitHub
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆235Sep 16, 2021Updated 4 years ago
niluthpol / weak_supervised_video_moment
View on GitHub
Weakly Supervised Video Moment Retrieval from Text Queries
☆43Jul 20, 2020Updated 6 years ago
LightDXY / BootMAE
View on GitHub
ECCV2022,Bootstrapped Masked Autoencoders for Vision BERT Pretraining
☆97Nov 2, 2022Updated 3 years ago
VALUE-Leaderboard / DataRelease
View on GitHub
Data Release for VALUE Benchmark
☆30Feb 16, 2022Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
m-bain / frozen-in-time
View on GitHub
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
☆376May 19, 2022Updated 4 years ago
showlab / Region_Learner
View on GitHub
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
☆43Jul 15, 2022Updated 4 years ago
jayleicn / moment_detr
View on GitHub
[NeurIPS 2021] Moment-DETR code and QVHighlights dataset
☆349Mar 9, 2026Updated 4 months ago
princetonvisualai / MQVR
View on GitHub
☆26Jan 12, 2022Updated 4 years ago
ucasligang / SimViT
View on GitHub
[ICME 2022] code for the paper, SimVit: Exploring a simple vision transformer with sliding windows.
☆67Oct 11, 2022Updated 3 years ago
jayleicn / VideoLanguageFuturePred
View on GitHub
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
☆52Aug 20, 2022Updated 3 years ago
MichiganCOG / Video-Grounding-from-Text
View on GitHub
Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"
☆47Jun 22, 2024Updated 2 years ago
zfchenUnique / WSSTG
View on GitHub
This repository contains the main baselines introduced in WSSTG (ACL 2019).
☆57Jul 8, 2024Updated 2 years ago
CryhanFang / CLIP2Video
View on GitHub
☆260Dec 10, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Shahzadnit / EZ-CLIP
View on GitHub
☆24May 11, 2025Updated last year
microsoft / UniVL
View on GitHub
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
☆366Jul 25, 2024Updated 2 years ago
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated last year
26hzhang / ReLoCLNet
View on GitHub
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)
☆58Aug 31, 2021Updated 4 years ago
ucasligang / awesome-MIM
View on GitHub
Reading list for research topics in Masked Image Modeling
☆333Dec 3, 2024Updated last year
microsoft / XPretrain
View on GitHub
Multi-modality pre-training
☆511Mar 27, 2026Updated 3 months ago
JonghwanMun / LGI4temporalgrounding
View on GitHub
Repository for the CVPR-20 paper "Local-Global Video-Text Interactions for Temporal Grounding"
☆132Jul 5, 2021Updated 5 years ago