Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆36Jun 20, 2025Updated 8 months ago
Alternatives and similar repositories for TetraJet-MXFP4Training
Users that are interested in TetraJet-MXFP4Training are comparing it to the libraries listed below
Sorting:
- A collection of research papers on low-precision training methods☆64May 10, 2025Updated 9 months ago
- ☆63Jul 21, 2024Updated last year
- Work in progress.☆79Nov 25, 2025Updated 3 months ago
- ☆46May 20, 2025Updated 9 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Sep 14, 2025Updated 5 months ago
- Official implementation for Training LLMs with MXFP4☆120Apr 25, 2025Updated 10 months ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆36Nov 27, 2025Updated 3 months ago
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆22Jul 4, 2025Updated 8 months ago
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆18Jul 1, 2025Updated 8 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- super-resolution; post-training quantization; model compression☆14Nov 10, 2023Updated 2 years ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆168Nov 11, 2025Updated 3 months ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago
- VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning☆61Nov 4, 2025Updated 4 months ago
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆18Dec 6, 2023Updated 2 years ago
- ☆27Mar 29, 2025Updated 11 months ago
- Implementation of BitNet-1.58 instruct tuning☆27Apr 14, 2024Updated last year
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆33Sep 30, 2025Updated 5 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆105Dec 20, 2024Updated last year
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engine☆26Dec 20, 2024Updated last year
- ☆23Jun 12, 2024Updated last year
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆23Jun 25, 2024Updated last year
- Low-bit optimizers for PyTorch☆138Oct 9, 2023Updated 2 years ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆25Feb 21, 2025Updated last year
- A framework to compare low-bit integer and float-point formats☆66Feb 6, 2026Updated last month
- ☆24Jan 22, 2025Updated last year
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆60Oct 31, 2024Updated last year
- ☆30Jul 22, 2024Updated last year
- ☆120Jan 8, 2026Updated last month
- Official Pytorch Implementation of "Outlier-weighed Layerwise Sampling for LLM Fine-tuning" by Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei …☆35Jun 3, 2025Updated 9 months ago
- [ICCV-2023] EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization☆28Dec 6, 2023Updated 2 years ago
- Pytorch implementation of our UniQ method, IEEE Access -- Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric …☆11Apr 7, 2021Updated 4 years ago
- ☆21Dec 14, 2025Updated 2 months ago
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 10 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Jul 17, 2023Updated 2 years ago
- [CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation☆45Jul 1, 2025Updated 8 months ago