lucaslingle / mu_transformerView external linksLinks
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆32Jun 5, 2025Updated 8 months ago
Alternatives and similar repositories for mu_transformer
Users that are interested in mu_transformer are comparing it to the libraries listed below
Sorting:
- Maximal Update Parametrization (μP) with Flax & Optax.☆16Dec 27, 2023Updated 2 years ago
- Track and Collaborate on ML & AI Experiments.☆44Mar 10, 2025Updated 11 months ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Supercharge huggingface transformers with model parallelism.☆78Jul 23, 2025Updated 6 months ago
- ☆12Jan 4, 2024Updated 2 years ago
- nanoGPT using Equinox☆15Mar 3, 2023Updated 2 years ago
- ☆14Oct 30, 2024Updated last year
- Rust implementation of Surya☆65Mar 1, 2025Updated 11 months ago
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)☆35Jul 21, 2025Updated 6 months ago
- Minimal but scalable implementation of large language models in JAX☆35Nov 28, 2025Updated 2 months ago
- Minimal Implimentation of VCRec (2024) for collapse provention.☆18Jan 28, 2025Updated last year
- ☆16Oct 20, 2025Updated 3 months ago
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Jun 11, 2023Updated 2 years ago
- Schedule free optimiser implemented in JAX using Optimistix☆15May 29, 2024Updated last year
- Basic world models☆30Oct 30, 2025Updated 3 months ago
- ☆40Jul 26, 2024Updated last year
- A Python toolkit for analyzing machine learning models and datasets.☆79Sep 8, 2023Updated 2 years ago
- ☆18Aug 24, 2024Updated last year
- A library for unit scaling in PyTorch☆133Jul 11, 2025Updated 7 months ago
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 8 months ago
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- An open source interactive spectrogram audio player, primarily based on bokeh and the holoviz stack (wav+holoviz=waloviz)☆68Jan 19, 2026Updated 3 weeks ago
- Lightweight tools for quick and easy LLM demo's☆28Sep 22, 2024Updated last year
- PyTorch centric eager mode debugger☆48Dec 16, 2024Updated last year
- Utilities for Training Very Large Models☆58Sep 25, 2024Updated last year
- A port of muP to JAX/Haiku☆25Oct 23, 2022Updated 3 years ago
- Utilities for efficient fine-tuning, inference and evaluation of code generation models☆21Oct 3, 2023Updated 2 years ago
- ☆23Jun 18, 2024Updated last year
- Interactive coding assistant for data scientists and machine learning developers, empowered by large language models.☆99Oct 8, 2024Updated last year
- ☆21Mar 3, 2025Updated 11 months ago
- Develop, evaluate and monitor LLM applications at scale☆100Nov 29, 2024Updated last year
- A fast RWKV Tokenizer written in Rust☆54Aug 12, 2025Updated 6 months ago
- Experimental GPU language with meta-programming☆25Sep 6, 2024Updated last year
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Oct 13, 2024Updated last year
- ☆63Feb 4, 2024Updated 2 years ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆66Nov 18, 2025Updated 2 months ago
- seqax = sequence modeling + JAX☆170Jul 23, 2025Updated 6 months ago
- GPT-style network for phonemization with durations of text☆68Mar 21, 2024Updated last year