rowanz/merlot

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rowanz/merlot)

rowanz / merlot

MERLOT: Multimodal Neural Script Knowledge Models

☆226

Alternatives and similar repositories for merlot

Users that are interested in merlot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rowanz / merlot_reserve
View on GitHub
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
☆146Jun 1, 2022Updated 4 years ago
tsujuifu / pytorch_violet
View on GitHub
A PyTorch implementation of VIOLET
☆138Dec 17, 2023Updated 2 years ago
VALUE-Leaderboard / DataRelease
View on GitHub
Data Release for VALUE Benchmark
☆30Feb 16, 2022Updated 4 years ago
antoyang / just-ask
View on GitHub
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆127Sep 29, 2023Updated 2 years ago
jayleicn / ClipBERT
View on GitHub
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…
☆730Aug 8, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
VALUE-Leaderboard / StarterCode
View on GitHub
Starter Code for VALUE benchmark
☆79Aug 23, 2022Updated 3 years ago
jayleicn / mTVRetrieval
View on GitHub
[ACL 2021] mTVR: Multilingual Video Moment Retrieval
☆27Aug 20, 2022Updated 3 years ago
jimmy646 / violin
View on GitHub
Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"
☆161Apr 29, 2020Updated 6 years ago
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated last year
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
linjieli222 / HERO
View on GitHub
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆235Sep 16, 2021Updated 4 years ago
zinengtang / DeCEMBERT
View on GitHub
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
☆17Jan 12, 2023Updated 3 years ago
InterDigitalInc / DialogSummary-VideoQA
View on GitHub
☆10Mar 30, 2022Updated 4 years ago
rowanz / piglet
View on GitHub
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]
☆56Oct 29, 2021Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
antoine77340 / MIL-NCE_HowTo100M
View on GitHub
PyTorch GPU distributed training code for MIL-NCE HowTo100M
☆221Jul 5, 2022Updated 4 years ago
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
m-bain / frozen-in-time
View on GitHub
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
☆376May 19, 2022Updated 4 years ago
zinengtang / VidLanKD
View on GitHub
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))
☆56Feb 6, 2023Updated 3 years ago
TencentARC / MCQ
View on GitHub
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
☆141Jul 20, 2022Updated 4 years ago
pzzhang / VinVL
View on GitHub
project page for VinVL
☆360Jul 26, 2023Updated 2 years ago
rxtan2 / video-grounding-narrations
View on GitHub
☆12Mar 12, 2023Updated 3 years ago
showlab / Region_Learner
View on GitHub
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
☆43Jul 15, 2022Updated 4 years ago
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆212Dec 18, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
airsplay / vimpac
View on GitHub
☆73Jun 3, 2022Updated 4 years ago
berniebear / Multi-HT100M
View on GitHub
☆53Dec 6, 2021Updated 4 years ago
jayleicn / singularity
View on GitHub
[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
☆136May 5, 2023Updated 3 years ago
ych133 / How2R-and-How2QA
View on GitHub
A video retrieval dataset How2R and a video QA dataset How2QA
☆24Oct 15, 2020Updated 5 years ago
microsoft / UniVL
View on GitHub
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
☆366Jul 25, 2024Updated last year
e-bug / cross-modal-ablation
View on GitHub
[EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…
☆20Jan 17, 2022Updated 4 years ago
clip-vil / CLIP-ViL
View on GitHub
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
☆419Oct 28, 2022Updated 3 years ago
TengdaHan / TemporalAlignNet
View on GitHub
[CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.
☆122Oct 9, 2023Updated 2 years ago
antoine77340 / S3D_HowTo100M
View on GitHub
S3D Text-Video model trained on HowTo100M using MIL-NCE
☆200Jul 3, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
antoyang / FrozenBiLM
View on GitHub
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆159Dec 9, 2024Updated last year
FingerRec / OA-Transformer
View on GitHub
[CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》
☆61May 25, 2022Updated 4 years ago
researchmm / soho
View on GitHub
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
☆208Sep 30, 2022Updated 3 years ago
ChenRocks / UNITER
View on GitHub
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
☆799Jun 30, 2021Updated 5 years ago
airsplay / vokenization
View on GitHub
PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"
☆191Mar 8, 2021Updated 5 years ago
yuewang-cuhk / awesome-vision-language-pretraining-papers
View on GitHub
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
☆1,159Aug 19, 2022Updated 3 years ago
jamespark3922 / visual-comet
View on GitHub
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
☆87Jun 12, 2023Updated 3 years ago