cat-state / tinyparLinks

☆20

Alternatives and similar repositories for tinypar

Users that are interested in tinypar are comparing it to the libraries listed below

Sorting:

xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
euclaise / supertrainer2000
☆50Updated last year
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
sholtodouglas / scalingExperiments
☆62Updated 3 years ago
cloneofsimo / min-fsdp
☆91Updated last year
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 3 months ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated 2 months ago
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 6 months ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆87Updated 3 years ago
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
mgmalek / efficient_cross_entropy
☆121Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
kernelmachine / cbtm
Code repository for the c-BTM paper
☆108Updated 2 years ago
berlino / seq_icl
☆53Updated last year
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
google-deepmind / asyncdiloco
☆47Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆73Updated last year
kyo-takano / chinchilla
A toolkit for scaling law research ⚖
☆53Updated 10 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆79Updated last year
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
Zyphra / Zyda_processing
☆39Updated last year
ethansmith2000 / TransformerExperiments
☆19Updated this week
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated 2 weeks ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year