Red-Hat-AI-Innovation-Team / SQuat
☆13Updated 3 weeks ago
Alternatives and similar repositories for SQuat
Users that are interested in SQuat are comparing it to the libraries listed below
Sorting:
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆16Updated 6 months ago
- ☆78Updated 8 months ago
- Unofficial Implementation of Selective Attention Transformer☆16Updated 6 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- This repo is based on https://github.com/jiaweizzhao/GaLore☆27Updated 7 months ago
- ☆17Updated 4 months ago
- ☆12Updated 4 months ago
- ☆24Updated last month
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆31Updated this week
- Work in progress.☆62Updated last month
- ☆37Updated 7 months ago
- ☆51Updated 6 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆66Updated 6 months ago
- ☆49Updated last month
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆29Updated 2 weeks ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆41Updated last year
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆12Updated 5 months ago
- ☆31Updated 4 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated 2 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 9 months ago
- ☆18Updated this week
- Code for "Reasoning to Learn from Latent Thoughts"☆94Updated last month
- Code for data-aware compression of DeepSeek models☆24Updated last month
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆56Updated 2 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆67Updated last month
- A repository for research on medium sized language models.☆76Updated 11 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆89Updated 11 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆37Updated 9 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 6 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆50Updated 2 months ago