Junction4Nako/mvp_pytorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Junction4Nako/mvp_pytorch)

Junction4Nako / mvp_pytorch

pytorch implementation of mvp: a multi-stage vision-language pre-training framework

☆35

Alternatives and similar repositories for mvp_pytorch

Users that are interested in mvp_pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ajaysub110 / A-Neural-Compositional-Paradigm-for-Image-Captioning
View on GitHub
Implementation of 'A Neural Compositional Paradigm for Image Captioning' by B. Dai, S.Fidler, D. Lin
☆12Mar 15, 2019Updated 7 years ago
bladewaltz1 / ModeCap
View on GitHub
Controllable mage captioning model with unsupervised modes
☆21Apr 14, 2023Updated 3 years ago
Weili-NLP / SelfCriticalSequenceTraining-tensorflow
View on GitHub
SelfCriticalSequenceTrainingforImageCaptioning
☆21May 27, 2017Updated 9 years ago
zmykevin / UVLP
View on GitHub
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆21Apr 15, 2022Updated 4 years ago
MUGE-2021 / image-retrieval-baseline
View on GitHub
☆60Nov 17, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
easonnie / mlp-vil
View on GitHub
MLPs for Vision and Langauge Modeling (Coming Soon)
☆27Dec 9, 2021Updated 4 years ago
christophschuhmann / 4MC-4M-Image-Text-Pairs-with-CLIP-embeddings
View on GitHub
I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…
☆17Apr 22, 2021Updated 5 years ago
FudanDISC / weakly-supervised-mVLP
View on GitHub
Implementation of our ACL2023 paper: Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Langua…
☆19Jul 5, 2023Updated 3 years ago
willard-yuan / video-text-retrieval-papers
View on GitHub
☆15Sep 16, 2021Updated 4 years ago
lvyiwei1 / DIME
View on GitHub
☆11Aug 20, 2024Updated last year
darsh10 / split_encoder_pointer_summarizer
View on GitHub
☆12Feb 18, 2020Updated 6 years ago
forkarinda / MFN
View on GitHub
Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos
☆12Oct 8, 2020Updated 5 years ago
zhaoyanpeng / cpcfg
View on GitHub
Fast and Modularized CFG-focused Models
☆23Nov 8, 2023Updated 2 years ago
jaeyun95 / pre-trained-vlk-model
View on GitHub
pre-trained vision and language model summary
☆12Apr 20, 2021Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Luoyadan / MM2020_ABG
View on GitHub
official PyTorch implementation of paper "Adversarial Bipartite Graph Learning for Video Domain Adaptation" (MM2020 Oral)
☆11Jun 16, 2022Updated 4 years ago
threelittlemonkeys / transformer-pytorch
View on GitHub
The Transformer in PyTorch
☆13Aug 7, 2024Updated last year
thecharm / Abs-LRModel
View on GitHub
Code for COLING 2020 paper "Controllable Abstractive Sentence Summarization with Guiding Entities"
☆12Dec 24, 2020Updated 5 years ago
qinzzz / Multimodal-Alignment-Framework
View on GitHub
Implementation for MAF: Multimodal Alignment Framework
☆45Nov 25, 2020Updated 5 years ago
omerarshad / MultiModalNER
View on GitHub
Code for paper "Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition"
☆16Aug 19, 2019Updated 6 years ago
ivanlai / Conditional_Text_Generation
View on GitHub
By fine tuning GPT2 on News Aggregator data
☆15Jan 24, 2021Updated 5 years ago
pzzhang / VinVL
View on GitHub
project page for VinVL
☆360Jul 26, 2023Updated 2 years ago
wxpkanon / CLEDforHTC
View on GitHub
☆13Apr 10, 2023Updated 3 years ago
Xingwei-Tan / hyper-event-TempRel
View on GitHub
Poincaré Event Temporal Embeddings and Hyperbolic GRU for Event TempRel Extraction
☆11Nov 8, 2021Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
microsoft / Oscar
View on GitHub
Oscar and VinVL
☆1,054Aug 28, 2023Updated 2 years ago
thecharm / Mega
View on GitHub
Code for ACM MM 2021 Paper "Multimodal Relation Extraction with Efficient Graph Alignment".
☆112Aug 2, 2022Updated 3 years ago
ChenRocks / UNITER
View on GitHub
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
☆799Jun 30, 2021Updated 5 years ago
KevinLight831 / P9D
View on GitHub
The download methods of Vision-language Continual Pretraining Dataset P9D.
☆12Jan 3, 2025Updated last year
GuangyanS / Sys2-LLaVA
View on GitHub
☆31Feb 10, 2025Updated last year
InnerPeace-Wu / im2p-tensorflow
View on GitHub
Implementation of CVPR2017 paper "A Hierarchical Approach for Generating Descriptive Image Paragraphs" in Tensorflow (in progress...)
☆13Jan 27, 2018Updated 8 years ago
xiyan-fu / MM-AVS
View on GitHub
A Full-Scale Dataset for Multi-modal Summarization
☆16Dec 8, 2021Updated 4 years ago
wwangwitsel / PLDA
View on GitHub
[KDD'22] Partial Label Learning with Discrimination Augmentation
☆10May 21, 2024Updated 2 years ago
skeletonNN / NHFNet
View on GitHub
☆36Dec 22, 2021Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
MAC-AutoML / PCL_AutoML_System
View on GitHub
☆10Jul 30, 2021Updated 4 years ago
gchhablani / multilingual-image-captioning
View on GitHub
☆43Aug 2, 2021Updated 4 years ago
evanhu1 / pytorch-CelebA-faCeGAN
View on GitHub
Deep convolutional conditional GAN implementation with CelebA dataset that allows for generation of custom faces according to textual inp…
☆18Jun 15, 2021Updated 5 years ago
bugensui / WenTianSearch
View on GitHub
“阿里灵杰”问天引擎电商搜索算法赛 13/2771
☆10Jul 31, 2022Updated 3 years ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
SsGood / ADGCN
View on GitHub
Pytorch Implementation for paper "Adversarial Graph Disentanglement"
☆13Jul 18, 2023Updated 3 years ago
josiahwang / phraseloceval
View on GitHub
Phrase Localization Evaluation Toolkit
☆20Aug 16, 2019Updated 6 years ago