A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
☆43Nov 8, 2020Updated 5 years ago
Alternatives and similar repositories for AoA-pytorch
Users that are interested in AoA-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆68Jan 10, 2023Updated 3 years ago
- Shows visual grounding methods can be right for the wrong reasons! (ACL 2020)☆23Jun 26, 2020Updated 5 years ago
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- [CVPR 2026] Elucidating the SNR-t Bias of Diffusion Probabilistic Models☆112Apr 20, 2026Updated 2 weeks ago
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30May 31, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Includes additional materials for the following keras.io blog post.☆12Jun 23, 2021Updated 4 years ago
- ☆37Jan 20, 2023Updated 3 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Apr 6, 2022Updated 4 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- A variant of Transformer-XL where the memory is updated not with a queue, but with attention☆49Jul 31, 2020Updated 5 years ago
- Usable implementation of Mogrifier, a circuit for enhancing LSTMs and potentially other networks, from Deepmind☆22Jun 9, 2024Updated last year
- Implementation and explorations into Blackbox Gradient Sensing (BGS), an evolutionary strategies approach proposed in a Google Deepmind p…☆20Apr 17, 2026Updated 3 weeks ago
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆32Jun 19, 2022Updated 3 years ago
- An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols☆16Aug 3, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"☆99Jan 13, 2021Updated 5 years ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Jan 21, 2025Updated last year
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆54Jul 2, 2023Updated 2 years ago
- Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, …☆39Aug 3, 2021Updated 4 years ago
- Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.☆20Aug 4, 2021Updated 4 years ago
- This repository provides the dataset introduced by our WSSTG paper☆13Jul 21, 2019Updated 6 years ago
- Implementation of Denoising Diffusion for protein design, but using the new Equiformer (successor to SE3 Transformers) with some addition…☆57Dec 27, 2022Updated 3 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆36Sep 27, 2021Updated 4 years ago
- Learning Long-term Visual Dynamics with Region Proposal Interaction Networks (ICLR 2021)☆113May 29, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch☆54Mar 30, 2021Updated 5 years ago
- Implementation of paper "Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Los…☆30Jun 29, 2020Updated 5 years ago
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆166Feb 12, 2024Updated 2 years ago
- Implementation of Kronecker Attention in Pytorch☆20Sep 12, 2020Updated 5 years ago
- A GPT, made only of MLPs, in Jax☆59Jun 23, 2021Updated 4 years ago
- Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch☆59Mar 19, 2021Updated 5 years ago
- Implementation of Metaformer, but in an autoregressive manner☆26Jun 21, 2022Updated 3 years ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆122Oct 17, 2024Updated last year
- Implementation of a holodeck, written in Pytorch☆19Nov 1, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Another attempt at a long-context / efficient transformer by me☆38Apr 11, 2022Updated 4 years ago
- ☆15Oct 27, 2020Updated 5 years ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Sep 23, 2024Updated last year
- Local Attention - Flax module for Jax☆22May 26, 2021Updated 4 years ago
- MMBERT: Multimodal BERT Pretraining for Improved Medical VQA☆39Mar 22, 2021Updated 5 years ago
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆47Jul 16, 2023Updated 2 years ago
- Multiple Meta-model Quantifying for Medical Visual Question Answering (MICCAI 2021)☆37Apr 21, 2026Updated 2 weeks ago