apple/ml-sigma-reparam

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/apple/ml-sigma-reparam)

apple / ml-sigma-reparam

☆315

Alternatives and similar repositories for ml-sigma-reparam

Users that are interested in ml-sigma-reparam are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

amirzandieh / HyperAttention
View on GitHub
Triton Implementation of HyperAttention Algorithm
☆48Dec 11, 2023Updated 2 years ago
apple / ml-ogen
View on GitHub
☆13Apr 7, 2024Updated 2 years ago
google-deepmind / asyncdiloco
View on GitHub
☆51Jan 18, 2024Updated 2 years ago
cloneofsimo / min-max-gpt
View on GitHub
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Apr 17, 2024Updated 2 years ago
facebookresearch / optimizers
View on GitHub
For optimization algorithm research and development.
☆579Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
HazyResearch / based
View on GitHub
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆256Jun 6, 2025Updated last year
facebookresearch / schedule_free
View on GitHub
Schedule-Free Optimization in PyTorch
☆2,317Jun 18, 2026Updated last month
proger / accelerated-scan
View on GitHub
Accelerated First Order Parallel Associative Scan
☆198Jan 7, 2026Updated 6 months ago
evanatyourservice / llm-jax
View on GitHub
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆19Jul 24, 2025Updated last year
srush / annotated-mamba
View on GitHub
Annotated version of the Mamba paper
☆501Feb 27, 2024Updated 2 years ago
apple / ml-planner
View on GitHub
☆60Mar 22, 2024Updated 2 years ago
cloneofsimo / min-fsdp
View on GitHub
☆93Jul 5, 2024Updated 2 years ago
proger / hippogriff
View on GitHub
Griffin MQA + Hawk Linear RNN Hybrid
☆89Apr 13, 2026Updated 3 months ago
databricks / megablocks
View on GitHub
☆1,583Mar 25, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
thjashin / multires-conv
View on GitHub
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆127Oct 11, 2023Updated 2 years ago
apoorvumang / prompt-lookup-decoding
View on GitHub
Simple speculative decoding technique, integrated in vLLM and transformers
☆611Aug 23, 2024Updated last year
apple / ml-reed
View on GitHub
☆13Feb 5, 2024Updated 2 years ago
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
patrick-kidger / jaxtyping
View on GitHub
Type annotations and runtime checking for shape and dtype of JAX/NumPy/PyTorch/etc. arrays. https://docs.kidger.site/jaxtyping/
☆1,845Jul 8, 2026Updated 2 weeks ago
athms / mad-lab
View on GitHub
A MAD laboratory to improve AI architecture designs 🧪
☆146Dec 17, 2024Updated last year
HomebrewML / Olmax
View on GitHub
HomebrewNLP in JAX flavour for maintable TPU-Training
☆50Jan 20, 2024Updated 2 years ago
srush / LLM-Training-Puzzles
View on GitHub
What would you do with 1000 H100s...
☆1,185Jan 10, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
google-deepmind / nanodo
View on GitHub
☆304Jul 15, 2024Updated 2 years ago
HazyResearch / safari
View on GitHub
Convolutions for Sequence Modeling
☆916Jun 13, 2024Updated 2 years ago
microsoft / mup
View on GitHub
maximal update parametrization (µP)
☆1,741Jul 17, 2024Updated 2 years ago
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆281Oct 3, 2025Updated 9 months ago
RobertCsordas / moe
View on GitHub
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆39Jun 11, 2025Updated last year
PiotrNawrot / nanoT5
View on GitHub
Fast & Simple repository for pre-training and fine-tuning T5-style models
☆1,021Aug 21, 2024Updated last year
Sea-Snell / JAXSeq
View on GitHub
Train very large language models in Jax.
☆208Oct 21, 2023Updated 2 years ago
smpanaro / more-ane-transformers
View on GitHub
Run transformers (incl. LLMs) on the Apple Neural Engine.
☆63Nov 22, 2023Updated 2 years ago
google / aqt
View on GitHub
☆359Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
JonasGeiping / linear_cross_entropy_loss
View on GitHub
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆75Aug 2, 2024Updated last year
apple / ml-tract
View on GitHub
☆48Apr 24, 2023Updated 3 years ago
RulinShao / retrieval-scaling
View on GitHub
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆226Dec 16, 2025Updated 7 months ago
JeanKaddour / NoTrainNoGain
View on GitHub
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Aug 30, 2023Updated 2 years ago
apple / ml-mdm
View on GitHub
Train high-quality text-to-image diffusion models in a data & compute efficient manner
☆515Jun 25, 2026Updated last month
pytorch / PiPPy
View on GitHub
Pipeline Parallelism for PyTorch
☆786Aug 21, 2024Updated last year
tobiaskatsch / GatedLinearRNN
View on GitHub
☆30Feb 27, 2024Updated 2 years ago