amazon-science/mxfp4-llm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/amazon-science/mxfp4-llm)

amazon-science / mxfp4-llm

Official implementation for Training LLMs with MXFP4

☆130

Alternatives and similar repositories for mxfp4-llm

Users that are interested in mxfp4-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IST-DASLab / Quartet
View on GitHub
☆127Mar 18, 2026Updated 4 months ago
Anonymous1252022 / fp4-all-the-way
View on GitHub
☆51May 20, 2025Updated last year
microsoft / microxcaling
View on GitHub
PyTorch emulation library for Microscaling (MX)-compatible data formats
☆358Jul 17, 2026Updated last week
IST-DASLab / FP-Quant
View on GitHub
☆116Feb 26, 2026Updated 5 months ago
mit-han-lab / fouroversix
View on GitHub
Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”
☆199Apr 21, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
thu-ml / TetraJet-MXFP4Training
View on GitHub
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆40May 4, 2026Updated 2 months ago
IST-DASLab / qutlass
View on GitHub
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆194Jul 20, 2026Updated last week
lhb8125 / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆19Updated this week
IST-DASLab / QuEST
View on GitHub
Work in progress.
☆80Nov 25, 2025Updated 8 months ago
Cornell-RelaxML / yaqa-quantization
View on GitHub
☆85Jun 20, 2025Updated last year
yanring / Megatron-MoE-ModelZoo
View on GitHub
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆201May 29, 2026Updated 2 months ago
hongsunjang / HILOS
View on GitHub
[ASPLOS'26] HILOS: A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs
☆20Jan 18, 2026Updated 6 months ago
aiha-lab / MX-QLLM
View on GitHub
LLM Inference with Microscaling Format
☆35Nov 12, 2024Updated last year
IST-DASLab / Sparse-Marlin
View on GitHub
Boosting 4-bit inference kernels with 2:4 Sparsity
☆96Sep 4, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ChenMnZ / INT_vs_FP
View on GitHub
[ICML 2026]A framework to compare low-bit integer and float-point formats
☆81May 6, 2026Updated 2 months ago
IST-DASLab / HALO
View on GitHub
HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…
☆31Feb 17, 2025Updated last year
Aaronhuang-778 / SliM-LLM
View on GitHub
[ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
☆62Aug 9, 2024Updated last year
joeljang / FLM
View on GitHub
All-in-one repository for Fine-tuning & Pretraining (Large) Language Models
☆15Mar 8, 2023Updated 3 years ago
AIS-SNU / GraNNDis_Artifact
View on GitHub
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…
☆10Aug 13, 2024Updated last year
chu-tianxiang / QuIP-for-all
View on GitHub
QuIP quantization
☆66Mar 17, 2024Updated 2 years ago
IST-DASLab / Quartet-II
View on GitHub
Quartet II Official Code
☆77May 1, 2026Updated 2 months ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 8 months ago
TianjinYellow / StableSPAM
View on GitHub
☆28Jul 2, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jason9693 / FROZEN
View on GitHub
☆14May 3, 2022Updated 4 years ago
hongsunjang / pipe-bd
View on GitHub
[DATE 2023] Pipe-BD: Pipelined Parallel Blockwise Distillation
☆12Jul 13, 2023Updated 3 years ago
Azure / MS-AMP
View on GitHub
Microsoft Automatic Mixed Precision Library
☆636Dec 1, 2025Updated 7 months ago
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆526Nov 26, 2024Updated last year
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,455Updated this week
Dao-AILab / fast-hadamard-transform
View on GitHub
Fast Hadamard transform in CUDA, with a PyTorch interface
☆343Mar 10, 2026Updated 4 months ago
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆27Apr 14, 2026Updated 3 months ago
HabanaAI / Megatron-DeepSpeed
View on GitHub
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆18Dec 19, 2024Updated last year
utkarsh-dmx / project-resq
View on GitHub
☆35Mar 28, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
pytorch / ao
View on GitHub
PyTorch native quantization and sparsity for training and inference
☆2,917Updated this week
peichenxie / FPRev
View on GitHub
☆26May 9, 2025Updated last year
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
ChengZhang-98 / llm-mixed-q
View on GitHub
Official implementation of EMNLP'23 paper "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"
☆24Oct 25, 2023Updated 2 years ago
Agora-Lab-AI / BitNet-a4.8
View on GitHub
BitNet a4.8 Implementation in one file of pytorch
☆24Jan 13, 2025Updated last year
aredden / torch-bnb-fp4
View on GitHub
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
☆30Mar 16, 2024Updated 2 years ago
Cornell-RelaxML / qtip
View on GitHub
☆181Jun 22, 2025Updated last year