microsoft/UniVL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/UniVL)

microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

☆365

Alternatives and similar repositories for UniVL

Users that are interested in UniVL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ArrowLuo / CLIP4Clip
View on GitHub
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
☆1,028Apr 12, 2024Updated 2 years ago
linjieli222 / HERO
View on GitHub
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆235Sep 16, 2021Updated 4 years ago
jayleicn / ClipBERT
View on GitHub
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…
☆730Aug 8, 2023Updated 2 years ago
simon-ging / coot-videotext
View on GitHub
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
☆291Sep 6, 2022Updated 3 years ago
antoine77340 / MIL-NCE_HowTo100M
View on GitHub
PyTorch GPU distributed training code for MIL-NCE HowTo100M
☆221Jul 5, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
tsujuifu / pytorch_violet
View on GitHub
A PyTorch implementation of VIOLET
☆138Dec 17, 2023Updated 2 years ago
microsoft / SwinBERT
View on GitHub
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
☆250May 26, 2022Updated 4 years ago
m-bain / frozen-in-time
View on GitHub
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
☆377May 19, 2022Updated 4 years ago
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated last year
antoine77340 / howto100m
View on GitHub
Code for the HowTo100M paper
☆303Mar 10, 2020Updated 6 years ago
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
ammesatyajit / VideoBERT
View on GitHub
Using VideoBERT to tackle video prediction
☆135May 10, 2021Updated 5 years ago
TencentARC / MCQ
View on GitHub
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
☆141Jul 20, 2022Updated 4 years ago
ArrowLuo / VideoFeatureExtractor
View on GitHub
Video Feature Extractor for S3D-HowTo100M
☆29Apr 30, 2021Updated 5 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
VALUE-Leaderboard / StarterCode
View on GitHub
Starter Code for VALUE benchmark
☆79Aug 23, 2022Updated 3 years ago
albanie / collaborative-experts
View on GitHub
Video embeddings for retrieval with natural language queries
☆344Feb 15, 2023Updated 3 years ago
gabeur / mmt
View on GitHub
Multi-Modal Transformer for Video Retrieval
☆265Oct 9, 2024Updated last year
CryhanFang / CLIP2Video
View on GitHub
☆260Dec 10, 2022Updated 3 years ago
microsoft / Oscar
View on GitHub
Oscar and VinVL
☆1,054Aug 28, 2023Updated 2 years ago
Deferf / CLIP_Video_Representation
View on GitHub
Use CLIP to represent video for Retrieval Task
☆71Mar 1, 2021Updated 5 years ago
v-iashin / MDVC
View on GitHub
PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)
☆144Apr 8, 2023Updated 3 years ago
zinengtang / VidLanKD
View on GitHub
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))
☆56Feb 6, 2023Updated 3 years ago
jayleicn / recurrent-transformer
View on GitHub
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
☆170Dec 4, 2020Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
airsplay / vimpac
View on GitHub
☆73Jun 3, 2022Updated 4 years ago
rowanz / merlot
View on GitHub
MERLOT: Multimodal Neural Script Knowledge Models
☆226Mar 15, 2022Updated 4 years ago
v-iashin / BMT
View on GitHub
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
☆231Apr 8, 2023Updated 3 years ago
salesforce / densecap
View on GitHub
☆192Jun 16, 2025Updated last year
jayleicn / TVRetrieval
View on GitHub
[ECCV 2020] PyTorch code for XML on TVRetrieval dataset - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
☆163May 28, 2024Updated 2 years ago
salesforce / ALBEF
View on GitHub
Code for ALBEF: a new vision-language pre-training method
☆1,757Sep 20, 2022Updated 3 years ago
MCG-NJU / CPD-Video
View on GitHub
Learning Spatiotemporal Features via Video and Text Pair Discrimination
☆60Jan 20, 2021Updated 5 years ago
TengdaHan / TemporalAlignNet
View on GitHub
[CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.
☆122Oct 9, 2023Updated 2 years ago
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆212Dec 18, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
antoine77340 / S3D_HowTo100M
View on GitHub
S3D Text-Video model trained on HowTo100M using MIL-NCE
☆200Jul 3, 2020Updated 6 years ago
antoine77340 / video_feature_extractor
View on GitHub
Easy to use video deep features extractor
☆322Jul 5, 2020Updated 6 years ago
facebookresearch / TimeSformer
View on GitHub
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
☆1,863Apr 9, 2024Updated 2 years ago
yuewang-cuhk / awesome-vision-language-pretraining-papers
View on GitHub
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
☆1,159Aug 19, 2022Updated 3 years ago
Chuhanxx / Temporal_Query_Networks
View on GitHub
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding
☆64Mar 9, 2022Updated 4 years ago
j-min / VL-T5
View on GitHub
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
☆372Jul 29, 2023Updated 2 years ago
ChenRocks / UNITER
View on GitHub
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
☆800Jun 30, 2021Updated 5 years ago