Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
☆48Oct 16, 2025Updated 4 months ago
Alternatives and similar repositories for why-low-precision-training-fails
Users that are interested in why-low-precision-training-fails are comparing it to the libraries listed below
Sorting:
- ☆13Jul 25, 2024Updated last year
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆18Jun 16, 2023Updated 2 years ago
- ☆17May 14, 2020Updated 5 years ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- ☆46May 20, 2025Updated 9 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 10 months ago
- Read, modify and write DICOS files with python code☆13Nov 24, 2025Updated 3 months ago
- PTX-EMU is a simple emulator for CUDA program.☆38Apr 25, 2025Updated 10 months ago
- ☆12Apr 2, 2024Updated last year
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated last year
- PyTorch-based radio-interferometric imaging reconstruction package with scalable Bayesian uncertainty quantification relying on data-driv…☆12Feb 17, 2025Updated last year
- ☆10Apr 24, 2024Updated last year
- ☆52Nov 5, 2024Updated last year
- MATLAB function to fill an area with hatching ~~or speckling~~☆11Mar 4, 2018Updated 8 years ago
- ☆14Apr 14, 2025Updated 10 months ago
- An artificial matrix generator in C☆12Feb 16, 2023Updated 3 years ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆26Jun 16, 2025Updated 8 months ago
- A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.☆12Jun 24, 2024Updated last year
- BERT Sentiment Classification on the IMDb Large Movie Review Dataset.☆16Sep 8, 2022Updated 3 years ago
- Code for the paper "Faster Neural Network Training with Approximate Tensor Operations"☆10Oct 23, 2021Updated 4 years ago
- A Redis-compatible in-memory database server written in Rust with MLua-based Lua 5.1 scripting☆17Nov 28, 2025Updated 3 months ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 7 months ago
- A benchmark of programming tasks for LLMs that supports almost any programming language.☆13Jun 30, 2025Updated 8 months ago
- Analyzes whole genome sequencing data for gene-editing verification☆10Feb 6, 2026Updated last month
- Musings in GEMM (General Matrix Multiplication)☆14Dec 14, 2025Updated 2 months ago
- Official Implementation of Robustifying and Boosting Training-Free Neural Architecture Search☆10Mar 12, 2024Updated last year
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆13Mar 15, 2025Updated 11 months ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- APB UVC ported to Verilator☆11Nov 19, 2023Updated 2 years ago
- (WIP) A relatively simple pipelined RISC-V core, written in Bluespec SystemVerilog☆12Sep 9, 2021Updated 4 years ago
- A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. …☆11May 4, 2022Updated 3 years ago
- sgx-based encrypted deduplication prototype☆14May 14, 2021Updated 4 years ago
- FPGA-based HyperLogLog Accelerator☆12Jul 13, 2020Updated 5 years ago
- ☆13Jul 14, 2025Updated 7 months ago
- Residual vector quantization for KV cache compression in large language model☆11Oct 22, 2024Updated last year
- Highly concurrent and fast content processing for Mighty Inference Server☆10Feb 6, 2023Updated 3 years ago
- ☆11Apr 5, 2023Updated 2 years ago