Code publication to the paper "Normalized Attention Without Probability Cage"
β17Nov 9, 2021Updated 4 years ago
Alternatives and similar repositories for normalized-attention
Users that are interested in normalized-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β21Mar 15, 2023Updated 3 years ago
- πΌοΈπβ11Jun 9, 2020Updated 5 years ago
- Code for the paper "Query-Key Normalization for Transformers"β52Mar 6, 2021Updated 5 years ago
- β12Mar 16, 2022Updated 4 years ago
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".β34Jun 11, 2025Updated 9 months ago
- NordVPN Threat Protection Proβ’ β’ AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- β22May 3, 2022Updated 3 years ago
- A JAX nn libraryβ21Sep 9, 2025Updated 6 months ago
- MXNet/Gluon implement of L-GM-Lossβ11Oct 17, 2018Updated 7 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixingβ49Jan 27, 2022Updated 4 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.β48Nov 30, 2021Updated 4 years ago
- [ACL 2021 Findings] HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extractionβ10Sep 16, 2021Updated 4 years ago
- Provides LiveReload.js compatible server as Boot taskβ11Nov 28, 2017Updated 8 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typingβ14Feb 10, 2023Updated 3 years ago
- Maximal Mutual Information (MMI) Taggerβ25Jun 6, 2019Updated 6 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- β20Mar 14, 2021Updated 5 years ago
- sigma-MoE layerβ21Jan 5, 2024Updated 2 years ago
- The Codebase for Causal Distillation for Language Models (NAACL '22)β26May 1, 2022Updated 3 years ago
- Randomized Smoothing of All Shapes and Sizes (ICML 2020).β51Jul 23, 2020Updated 5 years ago
- We got a stew going!β27Oct 3, 2023Updated 2 years ago
- An implementation of 2021 paper by Geoffrey Hinton: "How to represent part-whole hierarchies in a neural network" in Pytorch.β57Mar 29, 2021Updated 4 years ago
- API for accessing the GraphLog datasetβ90May 3, 2024Updated last year
- β42May 18, 2020Updated 5 years ago
- β11Jun 21, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Immutant adapter for Luminusβ10Sep 12, 2020Updated 5 years ago
- Graph-based and Transition-based dependency parsers based on BiLSTMsβ30Jan 4, 2019Updated 7 years ago
- Code for the 2019 TACL Paper "Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples"β36Jul 3, 2019Updated 6 years ago
- Implements the SM3-II adaptive optimization algorithm for PyTorch.β33Sep 3, 2024Updated last year
- Reinforcement Learning with Latent Flowβ44Mar 25, 2021Updated 5 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weightsβ19Oct 9, 2022Updated 3 years ago
- A conda-smithy repository for jaxlib.β17Updated this week
- Nonequispaced FFTs on GPUs (based on NFFT: http://www.nfft.org)β11Apr 30, 2018Updated 7 years ago
- β15Dec 5, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- β14Jun 26, 2019Updated 6 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorchβ12Jan 16, 2022Updated 4 years ago
- Reference implementation of algorithms for reinforcement learning and Markov decision processes.β12Jan 28, 2021Updated 5 years ago
- Code and analyses related to the ExaLearn drug design effortsβ11Sep 30, 2020Updated 5 years ago
- JAX implementation of Graph Attention Networksβ13Jan 29, 2022Updated 4 years ago
- Norm-Based Curriculum Learning for Neural Machine Translation (ACL 2020)β18Aug 1, 2020Updated 5 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"β127Apr 5, 2021Updated 4 years ago