Official implementation for Training LLMs with MXFP4
☆120Apr 25, 2025Updated 10 months ago
Alternatives and similar repositories for mxfp4-llm
Users that are interested in mxfp4-llm are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆36Jun 20, 2025Updated 8 months ago
- ☆120Jan 8, 2026Updated 2 months ago
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆347Jun 18, 2025Updated 8 months ago
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆30Updated this week
- Work in progress.☆79Nov 25, 2025Updated 3 months ago
- A framework to compare low-bit integer and float-point formats☆66Feb 6, 2026Updated last month
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆30Dec 22, 2025Updated 2 months ago
- Concise Reasoning via Reinforcement Learning☆13Apr 16, 2025Updated 10 months ago
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆22Jul 4, 2025Updated 8 months ago
- Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”☆132Updated this week
- Ongoing research training transformer models at scale☆18Updated this week
- Pytorch implementation of the Gato paper from Deepmind☆12Feb 8, 2023Updated 3 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- ☆16Jul 16, 2024Updated last year
- ☆15Sep 22, 2024Updated last year
- Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention☆48Oct 16, 2025Updated 4 months ago
- ☆21Apr 2, 2025Updated 11 months ago
- ☆40Dec 19, 2025Updated 2 months ago
- ☆16Sep 12, 2024Updated last year
- PoE-World: Compositional World Modeling with Products of Programmatic Experts☆39Feb 5, 2026Updated last month
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER☆21Jul 19, 2023Updated 2 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆21Aug 3, 2025Updated 7 months ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Jan 16, 2023Updated 3 years ago
- ☆27Mar 29, 2025Updated 11 months ago
- 한국어 자연어 처리 모델 미세조정☆17Jan 26, 2021Updated 5 years ago
- Inverted file system for billion-scale ANN search☆19Dec 7, 2023Updated 2 years ago
- ☆20Oct 8, 2024Updated last year
- Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy☆23Oct 28, 2024Updated last year
- Drop-in environment replacements that make your RL algorithm train faster.☆21Jun 19, 2024Updated last year
- Code of the Paper "Time-Efficient Reinforcement Learning with Stochastic Stateful Policies"☆25May 5, 2024Updated last year
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- new optimizer☆20Aug 4, 2024Updated last year
- This repository is the official implementation of the TRAC optimizer in Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement …☆33May 2, 2025Updated 10 months ago
- ☆22Apr 12, 2022Updated 3 years ago
- ☆24Dec 26, 2023Updated 2 years ago
- ☆60Mar 3, 2025Updated last year
- Google's Conceptual Captions Dataset translated into Korean☆23Aug 28, 2022Updated 3 years ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆98Apr 26, 2023Updated 2 years ago
- Anh - LAION's multilingual assistant datasets and models☆27Apr 5, 2023Updated 2 years ago