McGill-NLP/length-generalization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/McGill-NLP/length-generalization)

McGill-NLP / length-generalization

Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023

☆139

Alternatives and similar repositories for length-generalization

Users that are interested in length-generalization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chijames / KERPLE
View on GitHub
☆20Oct 25, 2022Updated 3 years ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
EleutherAI / rnngineering
View on GitHub
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆33May 25, 2024Updated 2 years ago
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
acosharma / elita-transformer
View on GitHub
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Jun 2, 2024Updated 2 years ago
whyNLP / LCKV
View on GitHub
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆157Apr 7, 2025Updated last year
teffland / ner-expected-entity-ratio
View on GitHub
Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022
☆14Nov 7, 2022Updated 3 years ago
amirzandieh / HyperAttention
View on GitHub
Triton Implementation of HyperAttention Algorithm
☆48Dec 11, 2023Updated 2 years ago
rycolab / aflt-f2023
View on GitHub
Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)
☆10Feb 21, 2023Updated 3 years ago
zhangjiong724 / spectral-RNN
View on GitHub
STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION
☆16Jun 5, 2018Updated 8 years ago
sunyt32 / torchscale
View on GitHub
Transformers at any scale
☆42Jan 18, 2024Updated 2 years ago
OpenNLPLab / HGRN
View on GitHub
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆68Apr 24, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
LouChao98 / nner_as_parsing
View on GitHub
☆16Mar 22, 2023Updated 3 years ago
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated 2 years ago
adihaviv / nopos
View on GitHub
☆23Jul 27, 2023Updated 2 years ago
AndyShih12 / LongHorizonTemperatureScaling
View on GitHub
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆21May 31, 2023Updated 3 years ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
sustcsonglin / disco-pointer
View on GitHub
Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …
☆14Aug 25, 2023Updated 2 years ago
zsLin177 / SRL-as-GP
View on GitHub
☆18Mar 10, 2023Updated 3 years ago
Timothyxxx / NeuralSymbolicPapers
View on GitHub
☆14Aug 18, 2022Updated 3 years ago
berlino / seq_icl
View on GitHub
☆54May 20, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ermongroup / fast_feedforward_computation
View on GitHub
Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021
☆30Sep 25, 2021Updated 4 years ago
robert-lieck / RBN
View on GitHub
Recursive Bayesian Networks
☆11May 11, 2025Updated last year
snu-mllab / Context-Memory
View on GitHub
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆63Apr 18, 2024Updated 2 years ago
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆58Aug 20, 2024Updated last year
machine-discovery / deer
View on GitHub
Parallelizing non-linear sequential models over the sequence length
☆57Jun 23, 2025Updated last year
srush / ProbTalk
View on GitHub
☆29Nov 30, 2021Updated 4 years ago
VPeterV / RankSpace-Models
View on GitHub
source code for NAACL2022 main conference "Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs"
☆10Sep 26, 2022Updated 3 years ago
Noahs-ARK / PaLM
View on GitHub
PyTorch implementation for PaLM: A Hybrid Parser and Language Model.
☆10Jan 7, 2020Updated 6 years ago
subho406 / agalite
View on GitHub
AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)
☆24Oct 15, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
neulab / neural-lpcfg
View on GitHub
The Return of Lexical Dependencies: Neural Lexicalized PCFGs (TACL)
☆33Sep 22, 2025Updated 10 months ago
expz / annotated-hyena
View on GitHub
An annotated implementation of the Hyena Hierarchy paper
☆34May 28, 2023Updated 3 years ago
emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
VITA-Group / Data-Efficient-Scaling
View on GitHub
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Jan 4, 2024Updated 2 years ago
Leooyii / LCEG
View on GitHub
[COLM'25] A Controlled Study on Long Context Extension and Generalization in LLMs
☆65Mar 9, 2026Updated 4 months ago
rycolab / parsing-as-tagging
View on GitHub
☆21Nov 19, 2023Updated 2 years ago
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year