thecharlieblake / lovely-llamaView external linksLinks
An implementation of the Llama architecture, to instruct and delight
☆21May 31, 2025Updated 8 months ago
Alternatives and similar repositories for lovely-llama
Users that are interested in lovely-llama are comparing it to the libraries listed below
Sorting:
- Flexibly track outputs and grad-outputs of torch.nn.Module.☆13Oct 6, 2023Updated 2 years ago
- ☆23Jun 18, 2024Updated last year
- Easily run PyTorch on multiple GPUs & machines☆58Jan 8, 2026Updated last month
- A repo to do interpretability of pre-trained acoustic models☆15Oct 15, 2023Updated 2 years ago
- Standalone commandline CLI tool for compiling Triton kernels☆20Sep 13, 2024Updated last year
- ☆15Oct 30, 2025Updated 3 months ago
- ☆12Jan 4, 2024Updated 2 years ago
- JAX implementation of the Mistral 7b v0.2 model☆35Jul 3, 2024Updated last year
- ☆18Mar 18, 2024Updated last year
- Minimalistic, hackable PyTorch implementation of SimSiam in ~400 lines. Achieves good performance on ImageNet with ResNet50. Features dis…☆21Nov 25, 2024Updated last year
- ☆18Jan 16, 2026Updated last month
- ☆16Oct 20, 2025Updated 3 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆75Aug 2, 2024Updated last year
- seqax = sequence modeling + JAX☆170Jul 23, 2025Updated 6 months ago
- ☆19Dec 4, 2025Updated 2 months ago
- ☆47Jan 18, 2024Updated 2 years ago
- Turn jitted jax functions back into python source code☆23Dec 16, 2024Updated last year
- ☆53May 20, 2024Updated last year
- ☆21Mar 3, 2025Updated 11 months ago
- A toolkit for scaling law research ⚖☆57Jan 27, 2025Updated last year
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Oct 13, 2024Updated last year
- Experimental GPU language with meta-programming☆25Sep 6, 2024Updated last year
- JAX bindings for Flash Attention v2☆103Feb 5, 2026Updated last week
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆32Jun 5, 2025Updated 8 months ago
- ☆28Jan 17, 2025Updated last year
- Universal Neurons in GPT2 Language Models☆30May 28, 2024Updated last year
- An High-resolution implementation of HiFi-GAN Vocoder for Voice Conversion.☆32Apr 10, 2023Updated 2 years ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- ☆34Sep 10, 2024Updated last year
- ☆28Nov 18, 2022Updated 3 years ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆78Dec 3, 2024Updated last year
- ☆28Dec 14, 2021Updated 4 years ago
- ☆33Nov 4, 2024Updated last year
- Jax like function transformation engine but micro, microjax☆34Oct 25, 2024Updated last year
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Dec 2, 2023Updated 2 years ago
- gzip Predicts Data-dependent Scaling Laws☆34May 28, 2024Updated last year
- ☆34May 14, 2025Updated 9 months ago
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year