jxiw / BiGSLinks

Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE benchmark with subquadratic complexity in length (or without attention).

☆114

Alternatives and similar repositories for BiGS

Users that are interested in BiGS are comparing it to the libraries listed below

Sorting:

McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆136Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆75Updated last year
KaiNylund / lm-weights-encode-time
☆69Updated last year
google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year
hadasah / btm
☆76Updated last year
epfml / DenseFormer
☆81Updated last year
HazyResearch / TART
TART: A plug-and-play Transformer module for task-agnostic reasoning
☆200Updated 2 years ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
srush / LLM-Talk
☆52Updated last year
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆34Updated 2 years ago
srush / do-we-need-attention
☆166Updated 2 years ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated 2 years ago
kernelmachine / cbtm
Code repository for the c-BTM paper
☆107Updated 2 years ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆222Updated last year
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆64Updated 2 years ago
ekinakyurek / google-research
Google Research
☆46Updated 2 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆58Updated 3 years ago
shikaiqiu / compute-better-spent
☆58Updated last year
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆115Updated 2 years ago
guy-dar / embedding-space
☆55Updated 2 years ago
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆68Updated 2 years ago
lucidrains / mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
☆88Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 4 months ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year