tech-srl/layer_norm_expressivity_role

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tech-srl/layer_norm_expressivity_role)

tech-srl / layer_norm_expressivity_role

Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)

☆60

Alternatives and similar repositories for layer_norm_expressivity_role

Users that are interested in layer_norm_expressivity_role are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated 2 years ago
giganticode / jemma
View on GitHub
JEMMA: An Extensible Java dataset for Many ML4Code Applications
☆19Dec 12, 2022Updated 3 years ago
Noahs-ARK / PaLM
View on GitHub
PyTorch implementation for PaLM: A Hybrid Parser and Language Model.
☆10Jan 7, 2020Updated 6 years ago
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
LZhengisme / CODA
View on GitHub
Implementation of Cascaded Head-colliding Attention (ACL'2021)
☆11Sep 16, 2021Updated 4 years ago
srush / tangent
View on GitHub
Source-to-Source Debuggable Derivatives in Pure Python
☆15Jan 23, 2024Updated 2 years ago
rycolab / aflt-f2023
View on GitHub
Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)
☆10Feb 21, 2023Updated 3 years ago
catid / spectral_ssm
View on GitHub
Implementation of Spectral State Space Models
☆16Feb 23, 2024Updated 2 years ago
teffland / ner-expected-entity-ratio
View on GitHub
Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022
☆14Nov 7, 2022Updated 3 years ago
tech-srl / c3po
View on GitHub
Code for the paper "A Structural Model for Contextual Code Changes"
☆32Oct 25, 2023Updated 2 years ago
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆25Jun 6, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
RakitinDen / pytorch-recursive-gumbel-max-trick
View on GitHub
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021
☆14Dec 11, 2021Updated 4 years ago
LouChao98 / nner_as_parsing
View on GitHub
☆16Mar 22, 2023Updated 3 years ago
zhangjiong724 / spectral-RNN
View on GitHub
STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION
☆16Jun 5, 2018Updated 8 years ago
subho406 / agalite
View on GitHub
AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)
☆24Oct 15, 2024Updated last year
radarFudan / mamba-minimal-jax
View on GitHub
☆36Nov 22, 2024Updated last year
acosharma / elita-transformer
View on GitHub
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Jun 2, 2024Updated 2 years ago
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆58Aug 20, 2024Updated last year
HazyResearch / embroid
View on GitHub
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
☆11Aug 12, 2023Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
radarFudan / Curse-of-memory
View on GitHub
Curse-of-memory phenomenon of RNNs in sequence modelling
☆19May 8, 2025Updated last year
sowmaster / esjacobians
View on GitHub
Implementations of the algorithms described in the paper: On the Convergence Theory for Hessian-Free Bilevel Algorithms.
☆11Nov 1, 2024Updated last year
cyk1337 / Highway-Transformer
View on GitHub
[ACL‘20] Highway Transformer: A Gated Transformer.
☆33Dec 5, 2021Updated 4 years ago
top-yun / SPARK
View on GitHub
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆19Dec 27, 2024Updated last year
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
RUCAIBox / MPOP
View on GitHub
☆13Jun 16, 2021Updated 5 years ago
da03 / criticize_text_generation
View on GitHub
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆12Mar 18, 2023Updated 3 years ago
Doraemonzzz / tnn-pytorch
View on GitHub
☆20Apr 17, 2023Updated 3 years ago
berlino / gated_linear_attention
View on GitHub
☆107Mar 9, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
View on GitHub
☆22Dec 15, 2023Updated 2 years ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
whyNLP / Probabilistic-Transformer
View on GitHub
A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.
☆26Oct 22, 2023Updated 2 years ago
Benjamin-Walker / selective-ssms-and-linear-cdes
View on GitHub
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
☆17Jan 7, 2025Updated last year
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated 2 years ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
lucidrains / token-shift-gpt
View on GitHub
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆49Jan 27, 2022Updated 4 years ago