google-deepmind/multimodal_transformers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-deepmind/multimodal_transformers)

google-deepmind / multimodal_transformers

☆67

Alternatives and similar repositories for multimodal_transformers

Users that are interested in multimodal_transformers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

The-Swarm-Corporation / AgentParse
View on GitHub
AgentParse is a high-performance parsing library designed to map various structured data formats (such as Pydantic models, JSON, YAML, an…
☆18Oct 13, 2025Updated 8 months ago
rayruizhiliao / mutual_info_img_txt
View on GitHub
Joint learning of images and text via maximization of mutual information
☆19Dec 14, 2021Updated 4 years ago
NewsStoriesData / newsstories.github.io
View on GitHub
☆22Sep 20, 2022Updated 3 years ago
kyegomez / MELLE
View on GitHub
An open source community implementation of the model MELLE from the paper: "Autoregressive Speech Synthesis without Vector Quantization"
☆16Jun 22, 2026Updated last week
stuarthalloway / exploring-clojure
View on GitHub
template project for exploring Clojure
☆39May 27, 2015Updated 11 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ljjcoder / CSEI
View on GitHub
Learning Intact Features by Erasing-Inpainting for Few-shot Classification
☆18Jul 28, 2022Updated 3 years ago
sjtuytc / Neurips21-ProTo-Program-guided-Transformers-for-Program-guided-Tasks
View on GitHub
Official code repo for "ProTo: program-guided Transformers for Program-guided Tasks
☆21Apr 15, 2022Updated 4 years ago
cuiyuhao1996 / mcan-vqa
View on GitHub
Deep Modular Co-Attention Networks for Visual Question Answering
☆10Jul 10, 2019Updated 6 years ago
evanmiltenburg / MeasureDiversity
View on GitHub
Measure the diversity of image descriptions, repository for our COLING 2018 paper.
☆13Dec 29, 2019Updated 6 years ago
amazon-science / embert
View on GitHub
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.
☆60Apr 10, 2024Updated 2 years ago
kayburns / tom-qa-dataset
View on GitHub
☆24Oct 31, 2018Updated 7 years ago
malihealikhani / Cross-modal_Coherence_Modeling
View on GitHub
Cross-modal Coherence Modeling for Caption Generation
☆11Jul 24, 2020Updated 5 years ago
google-deepmind / tell_me_why_explanations_rl
View on GitHub
☆37Apr 27, 2023Updated 3 years ago
kyegomez / MLXTransformer
View on GitHub
Simple Implementation of a Transformer in the new framework MLX by Apple
☆19Nov 18, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cuicathy / MMD_SurvivalPrediction
View on GitHub
Code and data for MICCAI 2022 accepted paper: Survival Prediction of Brain Cancer with Incomplete Radiology, Pathology, Genomics, and Dem…
☆19Mar 13, 2024Updated 2 years ago
shriphani / sleipnir
View on GitHub
A simple, performant web-crawler for clojure
☆17Mar 12, 2015Updated 11 years ago
allenai / faithful-nmn
View on GitHub
Evaluating and improving the faithfulness of the interpretations offered by Neural Module Networks
☆13Jun 12, 2023Updated 3 years ago
yeonsw / LOUVRE
View on GitHub
☆13Jun 21, 2021Updated 5 years ago
eriche2016 / image_caption_with_semantic_attenion
View on GitHub
image caption with semantic attention
☆11Apr 1, 2017Updated 9 years ago
mugen-org / MUGEN_coinrun
View on GitHub
A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset. This repo contains scripts …
☆13Jul 13, 2022Updated 3 years ago
ketranm / sa-nmt
View on GitHub
structured attention encoder
☆13Jun 6, 2018Updated 8 years ago
Agora-Lab-AI / The-Distiller
View on GitHub
Generate High Quality textual or multi-modal datasets with Agents
☆18Jun 7, 2023Updated 3 years ago
jason9693 / FROZEN
View on GitHub
☆14May 3, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lsq960124 / StyleBERT
View on GitHub
Implementation of the paper: StyleBERT: Text-Audio Sentiment Analysis with Bi-directional Style Enhancement
☆14Apr 10, 2023Updated 3 years ago
MILVLG / rosita
View on GitHub
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
☆57Jun 13, 2023Updated 3 years ago
google-deepmind / geomatch
View on GitHub
☆18Dec 11, 2023Updated 2 years ago
ethanjperez / rda
View on GitHub
Code for "Rissanen Data Analysis: Examining Dataset Characteristics via Description Length" by Ethan Perez, Douwe Kiela, and Kyungyhun Ch…
☆37Jun 10, 2021Updated 5 years ago
fudan-zvg / TDAS
View on GitHub
☆18Jun 10, 2022Updated 4 years ago
agiresearch / EmojiCrypt
View on GitHub
EmojiCrypt: Prompt Encryption for Secure Communication with Large Language Models
☆26Feb 21, 2024Updated 2 years ago
kbyran / POINet
View on GitHub
Person of Interest. A flexible computer vision library on human analysis, such as person re-identification, human attribute, pose estima…
☆10May 4, 2020Updated 6 years ago
Ironieser / MMTok
View on GitHub
[ICLR 2026] The official repo of "MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs"
☆44Mar 11, 2026Updated 3 months ago
deep-spin / infinite-former
View on GitHub
☆68Aug 29, 2024Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
LukasStruppek / Exploiting-Cultural-Biases-via-Homoglyphs
View on GitHub
[Journal of Artificial Intelligence Research] Source code for our paper "Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synth…
☆12Jan 8, 2024Updated 2 years ago
markendo / GaitForeMer
View on GitHub
Code for GaitForeMer.
☆23Aug 10, 2023Updated 2 years ago
jmhessel / pycocoevalcap
View on GitHub
Python 3 support for the MS COCO caption evaluation tools
☆14Jun 14, 2024Updated 2 years ago
alawryaguila / normativecVAE
View on GitHub
Python code for "Conditional VAEs for Confound Removal and Normative Modelling of Neurodegenerative Diseases"
☆10Oct 3, 2022Updated 3 years ago
Muccul / VQA-Chinese-tf2
View on GitHub
VQA-tf2
☆12Mar 16, 2021Updated 5 years ago
LevinRoman / parameter-space-saliency
View on GitHub
Parameter-Space Saliency Maps for Explainability
☆23Mar 21, 2023Updated 3 years ago
google-deepmind / ssl_hsic
View on GitHub
☆39Jul 30, 2024Updated last year