lucidrains / PaLM-jaxLinks

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)

☆187

Alternatives and similar repositories for PaLM-jax

Users that are interested in PaLM-jax are comparing it to the libraries listed below

Sorting:

google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year
sholtodouglas / scalingExperiments
☆61Updated 3 years ago
Sea-Snell / JAXSeq
Train very large language models in Jax.
☆205Updated last year
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 3 years ago
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆215Updated last year
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆118Updated last year
huggingface / bloom-jax-inference
☆67Updated 3 years ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆86Updated last year
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆50Updated last year
google / flaxformer
☆361Updated last year
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆219Updated last year
kingoflolz / swarm-jax
Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes
☆241Updated 2 years ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆140Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆204Updated last year
EleutherAI / pyfra
Python Research Framework
☆106Updated 2 years ago
google / praxis
☆187Updated this week
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆36Updated 2 years ago
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆81Updated 3 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
zphang / minimal-opt
☆67Updated 2 years ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆215Updated 11 months ago
kingoflolz / CLIP_JAX
Contrastive Language-Image Pretraining
☆144Updated 2 years ago
yandex-research / DeDLOC
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)
☆117Updated 3 years ago
borisdayma / clip-jax
Train vision models using JAX and 🤗 transformers
☆98Updated 3 months ago
ayaka14732 / jax-smi
JAX Synergistic Memory Inspector
☆177Updated last year
google-research / precondition
☆31Updated last month
jxbz / agd
Automatic gradient descent
☆207Updated 2 years ago
srush / do-we-need-attention
☆166Updated 2 years ago