PAIR-code / tiny-transformersLinks

☆22

Alternatives and similar repositories for tiny-transformers

Users that are interested in tiny-transformers are comparing it to the libraries listed below

Sorting:

lucidrains / einops-exts
Implementation of some personal helper functions for Einops, my most favorite tensor manipulation library ❤️
☆55Updated 2 years ago
crowsonkb / dice-mc
DiCE: The Infinitely Differentiable Monte-Carlo Estimator
☆32Updated 2 years ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
ColinQiyangLi / AdaCat
AdaCat
☆49Updated 3 years ago
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
tech-srl / layer_norm_expressivity_role
Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)
☆57Updated last year
crowsonkb / LDLM
Latent Diffusion Language Models
☆69Updated 2 years ago
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆22Updated 2 years ago
lucidrains / esbn-transformer
An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols
☆16Updated 4 years ago
lucidrains / metaformer-gpt
Implementation of Metaformer, but in an autoregressive manner
☆26Updated 3 years ago
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 3 years ago
learning-at-home / lean_transformer
Memory-efficient transformer. Work in progress.
☆19Updated 3 years ago
TomFrederik / grokking
Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'
☆39Updated 3 years ago
lucidrains / holodeck-pytorch
Implementation of a holodeck, written in Pytorch
☆18Updated 2 years ago
modal-labs / ci-on-modal
A sample pattern for running CI tests on Modal
☆18Updated 6 months ago
Qualcomm-AI-research / codeit
☆27Updated last year
AhmedImtiazPrio / grok-adversarial
Deep Networks Grok All the Time and Here is Why
☆37Updated last year
crypdick / timm-lr-scheduler-explorer
A dashboard for exploring timm learning rate schedulers
☆19Updated 11 months ago
lucidrains / quartic-transformer
Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)
☆55Updated 7 months ago
Aleph-Alpha-Research / trigrams
☆57Updated last month
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆45Updated 2 years ago
srush / drop7
☆18Updated last year
keyonvafa / world-model-evaluation
☆68Updated 11 months ago
ekinakyurek / google-research
Google Research
☆46Updated 3 years ago
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
cloneofsimo / zeroshampoo
☆34Updated last year
lucidrains / tableformer-pytorch
Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch
☆39Updated 3 years ago
ahthie7u / cockpit
Code for the anonymous submission "Cockpit: A Practical Debugging Tool for Training Deep Neural Networks"
☆31Updated 4 years ago
lucidrains / ESBN-pytorch
Usable implementation of Emerging Symbol Binding Network (ESBN), in Pytorch
☆25Updated 4 years ago
lucidrains / panoptic-transformer
Another attempt at a long-context / efficient transformer by me
☆38Updated 3 years ago