showlab/all-in-one

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/showlab/all-in-one)

showlab / all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

☆281

Alternatives and similar repositories for all-in-one

Users that are interested in all-in-one are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

showlab / Region_Learner
View on GitHub
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
☆43Jul 15, 2022Updated 4 years ago
tsujuifu / pytorch_violet
View on GitHub
A PyTorch implementation of VIOLET
☆138Dec 17, 2023Updated 2 years ago
showlab / DemoVLP
View on GitHub
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
☆22Mar 19, 2022Updated 4 years ago
m-bain / frozen-in-time
View on GitHub
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
☆376May 19, 2022Updated 4 years ago
microsoft / LAVENDER
View on GitHub
A Unified Framework for Video-Language Understanding
☆62Jun 17, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
TencentARC / MCQ
View on GitHub
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
☆141Jul 20, 2022Updated 4 years ago
jayleicn / ClipBERT
View on GitHub
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…
☆730Aug 8, 2023Updated 2 years ago
showlab / cosmo
View on GitHub
☆75May 10, 2024Updated 2 years ago
jayleicn / singularity
View on GitHub
[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
☆136May 5, 2023Updated 3 years ago
ArrowLuo / CLIP4Clip
View on GitHub
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
☆1,030Apr 12, 2024Updated 2 years ago
FingerRec / OA-Transformer
View on GitHub
[CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》
☆61May 25, 2022Updated 4 years ago
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated last year
microsoft / UniVL
View on GitHub
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
☆366Jul 25, 2024Updated 2 years ago
sail-sg / ptp
View on GitHub
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
☆150Jun 7, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
klauscc / VindLU
View on GitHub
☆108Dec 23, 2022Updated 3 years ago
antoyang / FrozenBiLM
View on GitHub
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆159Dec 9, 2024Updated last year
Yui010206 / SeViLA
View on GitHub
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆197Jan 14, 2024Updated 2 years ago
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆212Dec 18, 2022Updated 3 years ago
CSU-JPG / TextAtlas
View on GitHub
[ICML 2026]A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation
☆94Sep 27, 2025Updated 10 months ago
airsplay / vimpac
View on GitHub
☆73Jun 3, 2022Updated 4 years ago
showlab / GEB-Plus
View on GitHub
[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
☆17Aug 24, 2022Updated 3 years ago
rowanz / merlot
View on GitHub
MERLOT: Multimodal Neural Script Knowledge Models
☆226Mar 15, 2022Updated 4 years ago
showlab / EgoVLP
View on GitHub
[NeurIPS 2022] Egocentric Video-Language Pretraining
☆261May 9, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
antoyang / just-ask
View on GitHub
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆127Sep 29, 2023Updated 2 years ago
microsoft / XPretrain
View on GitHub
Multi-modality pre-training
☆511Mar 27, 2026Updated 4 months ago
VALUE-Leaderboard / DataRelease
View on GitHub
Data Release for VALUE Benchmark
☆30Feb 16, 2022Updated 4 years ago
showlab / Show-Anything-3D
View on GitHub
Edit and Generate Anything in 3D world!
☆13Apr 15, 2023Updated 3 years ago
jayleicn / mTVRetrieval
View on GitHub
[ACL 2021] mTVR: Multilingual Video Moment Retrieval
☆27Aug 20, 2022Updated 3 years ago
rowanz / merlot_reserve
View on GitHub
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
☆146Jun 1, 2022Updated 4 years ago
researchmm / soho
View on GitHub
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
☆208Sep 30, 2022Updated 3 years ago
zengyan-97 / X-VLM
View on GitHub
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
☆507Nov 25, 2022Updated 3 years ago
ju-chen / Efficient-Prompt
View on GitHub
☆197Oct 22, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
antoine77340 / MIL-NCE_HowTo100M
View on GitHub
PyTorch GPU distributed training code for MIL-NCE HowTo100M
☆221Jul 5, 2022Updated 4 years ago
linjieli222 / HERO
View on GitHub
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆235Sep 16, 2021Updated 4 years ago
opendatalab / CLIP-Parrot-Bias
View on GitHub
ECCV2024_Parrot Captions Teach CLIP to Spot Text
☆66Sep 6, 2024Updated last year
albanie / collaborative-experts
View on GitHub
Video embeddings for retrieval with natural language queries
☆344Feb 15, 2023Updated 3 years ago
StanLei52 / GEBD
View on GitHub
[ICCV2021] Generic Event Boundary Detection: A Benchmark for Event Segmentation
☆79Dec 28, 2021Updated 4 years ago
facebookresearch / SLIP
View on GitHub
Code release for SLIP Self-supervision meets Language-Image Pre-training
☆791Feb 9, 2023Updated 3 years ago
microsoft / VideoX
View on GitHub
VideoX: a collection of video cross-modal models
☆1,071Jun 3, 2024Updated 2 years ago