HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training

☆47

Alternatives and similar repositories for Olmax:

Users that are interested in Olmax are comparing it to the libraries listed below

HomebrewNLP / HomebrewNLP
A case study of efficient training of large language models using commodity hardware.
☆68Updated 2 years ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆34Updated last year
sholtodouglas / scalingExperiments
☆58Updated 2 years ago
lessw2020 / transformer_central
Various transformers for FSDP research
☆34Updated 2 years ago
warner-benjamin / optimi
Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers
☆77Updated 6 months ago
ClashLuke / TrueGrad
PyTorch interface for TrueGrad Optimizers
☆41Updated last year
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆47Updated 2 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆68Updated last year
crowsonkb / torch-dist-utils
Utilities for PyTorch distributed
☆23Updated last year
lucidrains / panoptic-transformer
Another attempt at a long-context / efficient transformer by me
☆37Updated 2 years ago
cat-state / tinypar
☆20Updated last year
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆34Updated 2 years ago
tensorfork / OBST
Your fruity companion for transformers
☆14Updated 2 years ago
cloneofsimo / zeroshampoo
☆33Updated 4 months ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆34Updated last year
borisdayma / clip-jax
Train vision models using JAX and 🤗 transformers
☆95Updated this week
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆78Updated 2 years ago
NathanGodey / headless-lm
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆24Updated 9 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆75Updated last month
google-research / precondition
☆30Updated 3 weeks ago
cloneofsimo / min-fsdp
☆75Updated 6 months ago
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆30Updated last month
google-research / jestimator
Amos optimizer with JEstimator lib.
☆81Updated 8 months ago
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated last year
davisyoshida / easy-lora-and-gptq
JAX notebook showing how to LoRA + GPTQ arbitrary models
☆10Updated last year
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆80Updated 3 years ago
krandiash / quinine
A library to create and manage configuration files, especially for machine learning projects.
☆76Updated 2 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆115Updated last year
dvruette / barrel-rec-pytorch
☆53Updated 11 months ago