andyehrenberg / flaxlm

☆27

Related projects: ⓘ

ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆28Updated last year
HomebrewNLP / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆46Updated 7 months ago
cat-state / tinypar
☆20Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆66Updated 7 months ago
young-geng / mlxu
Machine Learning eXperiment Utilities
☆42Updated 3 months ago
srush / mamba-primer
☆35Updated 5 months ago
sholtodouglas / scalingExperiments
☆56Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆76Updated 2 years ago
krandiash / quinine
A library to create and manage configuration files, especially for machine learning projects.
☆77Updated 2 years ago
berlino / seq_icl
☆48Updated 3 months ago
zphang / minimal-opt
☆67Updated 2 years ago
ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆57Updated 10 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆116Updated last month
google-research / precondition
☆27Updated 4 months ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆31Updated 3 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆84Updated 4 months ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆33Updated last year
dylandoblar / noether-networks
Meta-learning inductive biases in the form of useful conserved quantities.
☆37Updated last year
srush / mamba-scans
Blog post
☆16Updated 7 months ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆127Updated 6 months ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆74Updated 7 months ago
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆78Updated 2 years ago
epfml / schedules-and-scaling
☆47Updated 3 months ago
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆34Updated last year
cloneofsimo / min-fsdp
☆68Updated 2 months ago
srush / torch-golf
Silly twitter torch implementations.
☆46Updated last year
davisyoshida / qax
If it quacks like a tensor...
☆48Updated 7 months ago
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆29Updated 3 weeks ago
HomebrewNLP / HomebrewNLP
A case study of efficient training of large language models using commodity hardware.
☆68Updated 2 years ago
mgmalek / efficient_cross_entropy
☆66Updated 3 months ago