Code publication to the paper "Normalized Attention Without Probability Cage"
☆17Nov 9, 2021Updated 4 years ago
Alternatives and similar repositories for normalized-attention
Users that are interested in normalized-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Mar 15, 2023Updated 3 years ago
- Suite of 500 procedurally-generated NLP tasks to study language model adaptability☆21Jul 16, 2022Updated 3 years ago
- A GPT, made only of MLPs, in Jax☆59Jun 23, 2021Updated 4 years ago
- 🖼️📊☆11Jun 9, 2020Updated 5 years ago
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆12Mar 16, 2022Updated 4 years ago
- ☆13Nov 12, 2018Updated 7 years ago
- A JAX nn library☆21Sep 9, 2025Updated 7 months ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Aug 27, 2020Updated 5 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆48Nov 30, 2021Updated 4 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- Provides LiveReload.js compatible server as Boot task☆11Nov 28, 2017Updated 8 years ago
- Maximal Mutual Information (MMI) Tagger☆26Jun 6, 2019Updated 6 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A Clojure library for deconstructing Korean unicode syllable characters into alphabet characters☆10Nov 22, 2021Updated 4 years ago
- ☆21Mar 14, 2021Updated 5 years ago
- Usable implementation of Emerging Symbol Binding Network (ESBN), in Pytorch☆25Jan 6, 2021Updated 5 years ago
- The Codebase for Causal Distillation for Language Models (NAACL '22)☆26May 1, 2022Updated 4 years ago
- Randomized Smoothing of All Shapes and Sizes (ICML 2020).☆51Jul 23, 2020Updated 5 years ago
- We got a stew going!☆27Oct 3, 2023Updated 2 years ago
- An implementation of 2021 paper by Geoffrey Hinton: "How to represent part-whole hierarchies in a neural network" in Pytorch.☆57Mar 29, 2021Updated 5 years ago
- Colab TPU - Compatible XLNet model code☆12Feb 14, 2020Updated 6 years ago
- ☆42May 18, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- informal exposition of Weisfeiler-Leman similarity☆28Apr 30, 2021Updated 5 years ago
- ☆11Jun 21, 2022Updated 3 years ago
- 👨💻 my solutions to interview questions☆10May 14, 2016Updated 9 years ago
- Implements the SM3-II adaptive optimization algorithm for PyTorch.☆33Sep 3, 2024Updated last year
- Reinforcement Learning with Latent Flow☆44Mar 25, 2021Updated 5 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆36Nov 4, 2025Updated 6 months ago
- ☆11Nov 11, 2023Updated 2 years ago
- Non-invasive wearable circadian rhythm telemonitoring sensors☆17Apr 16, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆15Dec 5, 2019Updated 6 years ago
- Companion repository to "Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models"☆14May 31, 2023Updated 2 years ago
- ☆14Jun 26, 2019Updated 6 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Jan 16, 2022Updated 4 years ago
- Open Korean Text Processor wrapper for Clojure☆10Dec 1, 2018Updated 7 years ago
- Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, …☆39Aug 3, 2021Updated 4 years ago
- Code and analyses related to the ExaLearn drug design efforts☆11Sep 30, 2020Updated 5 years ago