cat-state / tinyparLinks
β20Updated last year
Alternatives and similar repositories for tinypar
Users that are interested in tinypar are comparing it to the libraries listed below
Sorting:
- β78Updated 10 months ago
- Large scale 4D parallelism pre-training for π€ transformers in Mixture of Experts *(still work in progress)*β82Updated last year
- β22Updated last year
- Exploring finetuning public checkpoints on filter 8K sequences on Pileβ114Updated 2 years ago
- Experiment of using Tangent to autodiff tritonβ79Updated last year
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizersβ93Updated 10 months ago
- β59Updated 3 years ago
- HomebrewNLP in JAX flavour for maintable TPU-Trainingβ50Updated last year
- Minimal but scalable implementation of large language models in JAXβ34Updated 7 months ago
- some common Huggingface transformers in maximal update parametrization (Β΅P)β80Updated 3 years ago
- β108Updated last year
- A library for squeakily cleaning and filtering language datasets.β46Updated last year
- β52Updated last year
- supporting pytorch FSDP for optimizersβ79Updated 5 months ago
- A toolkit for scaling law research ββ49Updated 4 months ago
- β49Updated last year
- Automatically take good care of your preemptible TPUsβ36Updated 2 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limitβ62Updated last year
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.β30Updated last week
- Language models scale reliably with over-training and on downstream tasksβ97Updated last year
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" givenβ¦β14Updated last year
- β53Updated last year
- Triton Implementation of HyperAttention Algorithmβ48Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebsβ56Updated 3 weeks ago
- Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)β19Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLoreβ28Updated 8 months ago
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for trβ¦β59Updated 7 months ago
- Easily run PyTorch on multiple GPUs & machinesβ46Updated 2 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.β67Updated 10 months ago
- Machine Learning eXperiment Utilitiesβ46Updated 11 months ago