thu-ml / TetraJet-MXFP4TrainingView external linksLinks
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆36Jun 20, 2025Updated 7 months ago
Alternatives and similar repositories for TetraJet-MXFP4Training
Users that are interested in TetraJet-MXFP4Training are comparing it to the libraries listed below
Sorting:
- A collection of research papers on low-precision training methods☆60May 10, 2025Updated 9 months ago
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆22Jul 4, 2025Updated 7 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Sep 14, 2025Updated 5 months ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆36Nov 27, 2025Updated 2 months ago
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆18Jul 1, 2025Updated 7 months ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated 11 months ago
- super-resolution; post-training quantization; model compression☆14Nov 10, 2023Updated 2 years ago
- VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning☆59Nov 4, 2025Updated 3 months ago
- ☆27Mar 29, 2025Updated 10 months ago
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆18Dec 6, 2023Updated 2 years ago
- Implementation of BitNet-1.58 instruct tuning☆27Apr 14, 2024Updated last year
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆32Sep 30, 2025Updated 4 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆106Dec 20, 2024Updated last year
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆23Jun 25, 2024Updated last year
- ☆23Jun 12, 2024Updated last year
- Low-bit optimizers for PyTorch☆138Oct 9, 2023Updated 2 years ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆25Feb 21, 2025Updated 11 months ago
- ☆24Jan 22, 2025Updated last year
- A fluent, scalable, and easy-to-use LLM data processing framework.☆28Jan 31, 2026Updated 2 weeks ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆60Oct 31, 2024Updated last year
- ☆119Jan 8, 2026Updated last month
- Official Pytorch Implementation of "Outlier-weighed Layerwise Sampling for LLM Fine-tuning" by Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei …☆35Jun 3, 2025Updated 8 months ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆177Jan 12, 2026Updated last month
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 9 months ago
- Pytorch implementation of our UniQ method, IEEE Access -- Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric …☆11Apr 7, 2021Updated 4 years ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Jul 17, 2023Updated 2 years ago
- [CVPR 2025] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation☆44Jul 1, 2025Updated 7 months ago
- ☆35Dec 22, 2025Updated last month
- Pytorch implementation of our paper accepted by TPAMI 2023 — Lottery Jackpots Exist in Pre-trained Models☆35Jun 19, 2023Updated 2 years ago
- AlphaGeometry Re-engineered☆39Nov 15, 2025Updated 2 months ago
- ☆26Feb 6, 2026Updated last week
- ☆35Mar 12, 2025Updated 11 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆93Sep 4, 2024Updated last year
- ☆157Jun 22, 2023Updated 2 years ago
- ☆43Dec 1, 2025Updated 2 months ago
- This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …☆23Dec 4, 2025Updated 2 months ago
- Promptopia is an open-source AI prompting tool for modern world to discover, create, and share creative prompts☆12May 27, 2023Updated 2 years ago