ucker / why-low-precision-training-failsView external linksLinks
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
☆44Oct 16, 2025Updated 4 months ago
Alternatives and similar repositories for why-low-precision-training-fails
Users that are interested in why-low-precision-training-fails are comparing it to the libraries listed below
Sorting:
- ☆13Jul 25, 2024Updated last year
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆18Jun 16, 2023Updated 2 years ago
- ☆17May 14, 2020Updated 5 years ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 7 months ago
- ☆46May 20, 2025Updated 8 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- a project to manipulate pdb files☆10Mar 6, 2024Updated last year
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 9 months ago
- Read, modify and write DICOS files with python code☆12Nov 24, 2025Updated 2 months ago
- Kinematic and dynamic models of continuum and articulated soft robots.☆15Nov 22, 2025Updated 2 months ago
- PTX-EMU is a simple emulator for CUDA program.☆37Apr 25, 2025Updated 9 months ago
- ☆52Nov 5, 2024Updated last year
- ☆12Apr 2, 2024Updated last year
- An artificial matrix generator in C☆12Feb 16, 2023Updated 3 years ago
- A Redis-compatible in-memory database server written in Rust with MLua-based Lua 5.1 scripting☆17Nov 28, 2025Updated 2 months ago
- A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.☆12Jun 24, 2024Updated last year
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated last year
- ☆10Apr 24, 2024Updated last year
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆25Jun 16, 2025Updated 8 months ago
- PyTorch-based radio-interferometric imaging reconstruction package with scalable Bayesian uncertainty quantification relying on data-driv…☆12Feb 17, 2025Updated 11 months ago
- BERT Sentiment Classification on the IMDb Large Movie Review Dataset.☆16Sep 8, 2022Updated 3 years ago
- MATLAB function to fill an area with hatching ~~or speckling~~☆11Mar 4, 2018Updated 7 years ago
- Code for the paper "Faster Neural Network Training with Approximate Tensor Operations"☆10Oct 23, 2021Updated 4 years ago
- ☆14Apr 14, 2025Updated 10 months ago
- ☆10Apr 2, 2024Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆52Aug 6, 2025Updated 6 months ago
- Locality sensitive hash functions for Tensorflow 2.0.☆12Feb 18, 2022Updated 3 years ago
- Official implementation of "Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent".☆21May 23, 2025Updated 8 months ago
- ☆13Jul 14, 2025Updated 7 months ago
- sgx-based encrypted deduplication prototype☆14May 14, 2021Updated 4 years ago
- example apps for inference.sh☆18Feb 9, 2026Updated last week
- ☆11Apr 5, 2023Updated 2 years ago
- Try to export the ONNX QDQ model that conforms to the AXERA NPU quantization specification. Currently, only w8a8 is supported.☆11Sep 10, 2024Updated last year
- FPGA-based HyperLogLog Accelerator☆12Jul 13, 2020Updated 5 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- APB UVC ported to Verilator☆11Nov 19, 2023Updated 2 years ago
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning☆10Nov 3, 2024Updated last year
- Code for "AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction", ACL 2023☆13May 19, 2023Updated 2 years ago