neilfei / brivl-nmi
☆60Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for brivl-nmi
- A curated list of vision-and-language pre-training (VLP). :-)☆56Updated 2 years ago
- UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning☆69Updated 3 years ago
- Bling's Object detection tool☆56Updated last year
- Implementation of the Benchmark Approaches for Medical Instructional Video Classification (MedVidCL) and Medical Video Question Answering…☆28Updated last year
- Bridging Vision and Language Model☆279Updated last year
- A repo for REMOD: relation extraction algorithm based on multimodality knowledge distillation☆28Updated 2 years ago
- The Document of WenLan API, which was used to obtain image and text feature.☆37Updated last year
- A *tuned* minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆107Updated 3 years ago
- [Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph☆65Updated 9 months ago
- Attaching human-like eyes to the large language model. The codes of IEEE TMM paper "LMEye: An Interactive Perception Network for Large La…☆48Updated 4 months ago
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated last year
- Official code for the NAACL 2022 paper "Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text…☆32Updated 2 years ago
- ☆101Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆35Updated 2 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated last year
- [TMLR 2022] High-Modality Multimodal Transformer☆107Updated 3 weeks ago
- the world's first large-scale multi-modal short-video encyclopedia, where the primitive units are items, aspects, and short videos.☆60Updated 11 months ago
- pytorch implementation of mvp: a multi-stage vision-language pre-training framework☆33Updated last year
- [IJCAI 2024] Continual Multimodal Knowledge Graph Construction☆37Updated 2 weeks ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆29Updated 10 months ago
- ☆57Updated 2 years ago
- DSTC10 Track1 - MOD: Internet Meme Incorporated Open-domain Dialog☆49Updated last year
- Source code of LatentOps☆77Updated last year
- Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”☆29Updated last year
- ☆63Updated 11 months ago
- CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021☆57Updated 2 years ago
- Southeast University Knowledge Graph-OpenRichpedia☆37Updated 3 years ago
- [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection☆25Updated 9 months ago
- implementation of paper https://arxiv.org/abs/2210.04559☆54Updated 2 years ago
- Language Models Can See: Plugging Visual Controls in Text Generation☆255Updated 2 years ago