ethansmith2000 / TransformerExperiments
☆18Updated last month
Related projects ⓘ
Alternatives and complementary repositories for TransformerExperiments
- ☆31Updated 2 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated 5 months ago
- Utilities for PyTorch distributed☆23Updated last year
- Efficient optimizers☆42Updated this week
- An implementation of the Llama architecture, to instruct and delight☆21Updated 2 months ago
- ☆20Updated last year
- Automatically take good care of your preemptible TPUs☆31Updated last year
- Latent Diffusion Language Models☆67Updated last year
- ☆46Updated last month
- GoldFinch and other hybrid transformer components☆39Updated 3 months ago
- ☆76Updated 6 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 10 months ago
- Efficient PScan implementation in PyTorch☆15Updated 10 months ago
- ☆27Updated 6 months ago
- ☆61Updated 2 months ago
- ☆18Updated this week
- ☆72Updated 4 months ago
- ☆24Updated 8 months ago
- Experiment of using Tangent to autodiff triton☆71Updated 9 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆15Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆83Updated last week
- ☆53Updated 9 months ago
- Train vision models using JAX and 🤗 transformers☆95Updated 2 weeks ago
- Collection of autoregressive model implementation☆66Updated this week
- Implementation of GateLoop Transformer in Pytorch and Jax☆86Updated 4 months ago
- Parallel Associative Scan for Language Models☆18Updated 10 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆23Updated 5 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated last week
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆53Updated 5 months ago