pytorch implementation of mvp: a multi-stage vision-language pre-training framework
☆34Mar 1, 2023Updated 3 years ago
Alternatives and similar repositories for mvp_pytorch
Users that are interested in mvp_pytorch are comparing it to the libraries listed below
Sorting:
- Implementation of 'A Neural Compositional Paradigm for Image Captioning' by B. Dai, S.Fidler, D. Lin☆12Mar 15, 2019Updated 6 years ago
- Multi-index hashing for the resolution of ANN search problem on large datasets☆15Oct 16, 2018Updated 7 years ago
- ☆60Nov 17, 2022Updated 3 years ago
- Controllable mage captioning model with unsupervised modes☆21Apr 14, 2023Updated 2 years ago
- SelfCriticalSequenceTrainingforImageCaptioning☆21May 27, 2017Updated 8 years ago
- Implementation for MAF: Multimodal Alignment Framework☆46Nov 25, 2020Updated 5 years ago
- CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment☆22Apr 15, 2022Updated 3 years ago
- SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models☆21Jan 11, 2024Updated 2 years ago
- ☆28Feb 10, 2025Updated last year
- Leon Bottou's SGD☆34Oct 13, 2011Updated 14 years ago
- MLPs for Vision and Langauge Modeling (Coming Soon)☆27Dec 9, 2021Updated 4 years ago
- Code for Cross-Modal 3D Shape Generation and Manipulation (ECCV 2022)☆29May 23, 2023Updated 2 years ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Nov 21, 2024Updated last year
- Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"☆800Jun 30, 2021Updated 4 years ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- Code for our EMNLP 2022 paper: Generative Entity Typing with Curriculum Learning.☆13Aug 19, 2023Updated 2 years ago
- Style Transfer by Rigid Alignment in Neural Net Feature Space☆11Jan 23, 2021Updated 5 years ago
- 新词发现/新词挖掘/自由度/凝固度/python3☆10May 28, 2019Updated 6 years ago
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆41Sep 15, 2025Updated 5 months ago
- Pytorch Code and Data for EnvEdit: Environment Editing for Vision-and-Language Navigation (CVPR 2022)☆30Aug 2, 2022Updated 3 years ago
- ☆33Jun 4, 2022Updated 3 years ago
- Distilling BERT using natural language generation.☆39Aug 13, 2023Updated 2 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated 11 months ago
- 深度学习的基础课程☆14May 4, 2018Updated 7 years ago
- Artificial Intelligence project☆10Mar 11, 2016Updated 9 years ago
- ☆12Aug 30, 2022Updated 3 years ago
- Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"☆13May 12, 2023Updated 2 years ago
- Federated Meta-Learning for Emotion and Sentiment Aware Multi-modal Complaint Identification☆10May 30, 2024Updated last year
- Detects scene change or cuts in a video file☆11Oct 23, 2017Updated 8 years ago
- 豆瓣电影评论可视化☆10May 19, 2016Updated 9 years ago
- Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.☆11Nov 27, 2022Updated 3 years ago
- [KDD'22] Partial Label Learning with Discrimination Augmentation☆10May 21, 2024Updated last year
- Code Release for the paper "Make-A-Story: Visual Memory Conditioned Consistent Story Generation" in CVPR 2023☆43Jun 27, 2023Updated 2 years ago
- Proximal Asynchronous SAGA☆13Nov 30, 2017Updated 8 years ago
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Dec 25, 2024Updated last year
- code for the paper in NeurIPS 2019☆40Mar 24, 2023Updated 2 years ago
- The Document of WenLan API, which was used to obtain image and text feature.☆41Jan 10, 2023Updated 3 years ago
- X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)☆492Nov 25, 2022Updated 3 years ago
- ☆44Aug 2, 2021Updated 4 years ago