☆316Jun 21, 2024Updated last year
Alternatives and similar repositories for ml-sigma-reparam
Users that are interested in ml-sigma-reparam are comparing it to the libraries listed below
Sorting:
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- ☆47Jan 18, 2024Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Apr 17, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆248Jun 6, 2025Updated 9 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆138Dec 17, 2024Updated last year
- Annotated version of the Mamba paper☆497Feb 27, 2024Updated 2 years ago
- ☆13Apr 7, 2024Updated last year
- Schedule-Free Optimization in PyTorch☆2,257May 21, 2025Updated 9 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆224Dec 16, 2025Updated 2 months ago
- For optimization algorithm research and development.☆557Updated this week
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆39Jun 11, 2025Updated 8 months ago
- Accelerated First Order Parallel Associative Scan☆195Jan 7, 2026Updated last month
- ☆93Jul 5, 2024Updated last year
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- ☆13Feb 5, 2024Updated 2 years ago
- ☆15Mar 2, 2025Updated last year
- ☆596Aug 23, 2024Updated last year
- What would you do with 1000 H100s...☆1,154Jan 10, 2024Updated 2 years ago
- ☆292Jul 15, 2024Updated last year
- HomebrewNLP in JAX flavour for maintable TPU-Training☆51Jan 20, 2024Updated 2 years ago
- Convolutions for Sequence Modeling☆913Jun 13, 2024Updated last year
- ☆57Mar 22, 2024Updated last year
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,017Aug 21, 2024Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆81Aug 30, 2023Updated 2 years ago
- A library for unit scaling in PyTorch☆133Jul 11, 2025Updated 7 months ago
- ☆29Feb 27, 2024Updated 2 years ago
- ☆23Jun 18, 2024Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆89Apr 26, 2024Updated last year
- maximal update parametrization (µP)☆1,686Jul 17, 2024Updated last year
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆127Oct 11, 2023Updated 2 years ago
- Triton-based implementation of Sparse Mixture of Experts.☆268Oct 3, 2025Updated 5 months ago
- JAX implementation of the Llama 2 model☆216Feb 2, 2024Updated 2 years ago
- ☆24Sep 25, 2024Updated last year
- Type annotations and runtime checking for shape and dtype of JAX/NumPy/PyTorch/etc. arrays. https://docs.kidger.site/jaxtyping/☆1,743Feb 16, 2026Updated 2 weeks ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆595Aug 12, 2025Updated 6 months ago
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,560Jan 14, 2026Updated last month
- Self-Conditioning Pre-Trained Language Models, ICML 2022☆34Jul 12, 2022Updated 3 years ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆67Apr 24, 2024Updated last year