HazyResearch / safariLinks

Convolutions for Sequence Modeling

☆893

Alternatives and similar repositories for safari

Users that are interested in safari are comparing it to the libraries listed below

Sorting:

HazyResearch / H3
Language Modeling with the H3 State Space Model
☆520Updated last year
Liuhong99 / Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆964Updated last year
lucidrains / memorizing-transformers-pytorch
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …
☆634Updated 2 years ago
JonasGeiping / cramming
Cramming the training of a (BERT-type) language model into limited compute.
☆1,338Updated last year
lucidrains / MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆646Updated 6 months ago
PiotrNawrot / nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
☆1,006Updated 11 months ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆486Updated last year
kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆379Updated last year
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆556Updated 6 months ago
google-deepmind / tracr
☆540Updated last year
abertsch72 / unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,062Updated last year
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆591Updated last year
microsoft / mup
maximal update parametrization (µP)
☆1,558Updated last year
srush / annotated-s4
Implementation of https://srush.github.io/annotated-s4
☆499Updated last month
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆516Updated last year
lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆411Updated 6 months ago
tysam-code / hlb-gpt
Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…
☆348Updated 11 months ago
changjonathanc / minLoRA
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
☆468Updated 2 years ago
lucidrains / PaLM-pytorch
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways
☆823Updated 2 years ago
explosion / curated-transformers
🤖 A PyTorch library of curated Transformer models and their composable components
☆892Updated last year
lucidrains / RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
☆869Updated last year
stanford-crfm / levanter
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆617Updated this week
redotvideo / mamba-chat
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
☆927Updated last year
google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆644Updated last month
syncdoth / RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…
☆226Updated last year
princeton-nlp / MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
☆1,116Updated last year
jxbz / agd
Automatic gradient descent
☆208Updated 2 years ago
google-research / meliad
☆256Updated last month
google / learned_optimization
☆780Updated last month
apple / ml-sigma-reparam
☆304Updated last year