neilfei / brivl-nmiLinks
☆59Updated 3 years ago
Alternatives and similar repositories for brivl-nmi
Users that are interested in brivl-nmi are comparing it to the libraries listed below
Sorting:
- Official code for ICLR 2022 paper: "PoNet: Pooling Network for Efficient Token Mixing in Long Sequences".☆33Updated 2 years ago
- A curated list of vision-and-language pre-training (VLP). :-)☆59Updated 3 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- Bling's Object detection tool☆56Updated 2 years ago
- UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning☆70Updated 4 years ago
- CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021☆64Updated 3 years ago
- pytorch implementation of mvp: a multi-stage vision-language pre-training framework☆34Updated 2 years ago
- ☆32Updated 3 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆81Updated 4 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 10 months ago
- [TMLR 2022] High-Modality Multimodal Transformer☆117Updated 11 months ago
- 基于Transformer的单模型、多尺度的VAE模型☆57Updated 4 years ago
- ☆40Updated 2 years ago
- Southeast University Knowledge Graph-OpenRichpedia☆39Updated 4 years ago
- 简单的挖矿病毒查杀脚本☆18Updated 3 years ago
- ☆106Updated 3 years ago
- A *tuned* minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆118Updated 4 years ago
- Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”☆32Updated 2 years ago
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated 2 years ago
- Official implementation for "Parameter-Efficient Fine-Tuning Design Spaces"☆27Updated 2 years ago
- the world's first large-scale multi-modal short-video encyclopedia, where the primitive units are items, aspects, and short videos.☆64Updated last year
- TagGPT: Large Language Models are Zero-shot Multimodal Taggers☆64Updated 2 years ago
- A repo for REMOD: relation extraction algorithm based on multimodality knowledge distillation☆28Updated 3 years ago
- Bridging Vision and Language Model☆285Updated 2 years ago
- custom pytorch implementation of MoCo v3☆46Updated 4 years ago
- WuDaoMM this is a data project☆74Updated 3 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆98Updated 2 years ago
- Learning to Encode Position for Transformer with Continuous Dynamical Model☆60Updated 5 years ago
- implementation of paper https://arxiv.org/abs/2210.04559☆56Updated 3 years ago
- CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)☆80Updated 4 years ago