keunwoochoi / tokenizer-vs-tokenizerLinks

☆14

Alternatives and similar repositories for tokenizer-vs-tokenizer

Users that are interested in tokenizer-vs-tokenizer are comparing it to the libraries listed below

Sorting:

lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆91Updated last year
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆61Updated 3 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆70Updated 2 years ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
epfml / DenseFormer
☆82Updated last year
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆47Updated 2 months ago
ethansmith2000 / TransformerExperiments
☆19Updated 6 months ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
shikaiqiu / compute-better-spent
☆61Updated last year
lessw2020 / transformer_central
Various transformers for FSDP research
☆38Updated 3 years ago
NathanGodey / headless-lm
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆28Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 2 months ago
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆112Updated last month
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆74Updated this week
lucidrains / perceiver-ar-pytorch
Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch
☆93Updated 2 years ago
lucidrains / light-recurrent-unit-pytorch
Implementation of a Light Recurrent Unit in Pytorch
☆49Updated last year
crowsonkb / torch-dist-utils
Utilities for PyTorch distributed
☆25Updated 9 months ago
lucidrains / kalman-filtering-attention
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆59Updated 2 years ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 11 months ago
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated last year
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated last week
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
lucidrains / transformer-lm-gan
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆38Updated 9 months ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
euclaise / supertrainer2000
☆50Updated last year
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
arogozhnikov / adamw_bfloat16
AdamW optimizer for bfloat16 models in pytorch 🔥.
☆38Updated last year