HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arxiv.org/abs/2501.02625
☆29Feb 17, 2025Updated last year
Alternatives and similar repositories for HALO
Users that are interested in HALO are comparing it to the libraries listed below
Sorting:
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆71Jul 5, 2025Updated 7 months ago
- Explore training for quantized models☆26Jul 12, 2025Updated 7 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Jul 21, 2023Updated 2 years ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆80Dec 18, 2025Updated 2 months ago
- ☆63Jul 21, 2024Updated last year
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆36Jun 20, 2025Updated 8 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆128Jul 13, 2024Updated last year
- ☆46Feb 17, 2026Updated last week
- ☆35Dec 22, 2025Updated 2 months ago
- ☆34Jul 15, 2021Updated 4 years ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆372Jul 10, 2025Updated 7 months ago
- ☆156Jun 22, 2023Updated 2 years ago
- ☆11Jan 13, 2026Updated last month
- Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information☆11Sep 28, 2023Updated 2 years ago
- langgraph的deepagent源码分析☆16Jan 1, 2026Updated last month
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 4 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆388Apr 13, 2025Updated 10 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Jun 11, 2025Updated 8 months ago
- Concise Reasoning via Reinforcement Learning☆13Apr 16, 2025Updated 10 months ago
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆10Jul 27, 2024Updated last year
- ☆18Nov 26, 2025Updated 3 months ago
- ☆29Nov 19, 2025Updated 3 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆13Mar 30, 2024Updated last year
- Generating Summaries with Controllable Readability Levels (EMNLP 2023)☆14Aug 6, 2025Updated 6 months ago
- EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆27Jul 30, 2025Updated 6 months ago
- My solution code to parallel architecture and programming Spring 2016☆12Aug 15, 2016Updated 9 years ago
- Deep Learning Framework with a specialisation aimed for Binarized Neural Networks.☆11Jan 9, 2022Updated 4 years ago
- ChatGPT solutions for the MLE interview☆14Dec 9, 2022Updated 3 years ago
- ☆13May 3, 2024Updated last year
- ☆10Mar 31, 2022Updated 3 years ago
- Counterfactual Explanation Based on Gradual Construction for Deep Networks Pytorch☆11Apr 7, 2021Updated 4 years ago
- Các thí nghiệm liên quan tới LLMs cho tiếng Việt (insprised by Physics of LLMs Series)☆11Oct 21, 2024Updated last year
- ☆13Feb 20, 2026Updated last week
- Lossless normalization of uppercase characters☆11Jul 3, 2023Updated 2 years ago
- I modified some code of K-BERT so that it can be fit to English datasets Topics Resources☆11Dec 15, 2022Updated 3 years ago
- ☆12Aug 18, 2023Updated 2 years ago
- code for Towards Data Science article on prompt-loss-weight☆11Jun 4, 2025Updated 8 months ago
- ☆11Dec 23, 2022Updated 3 years ago