Layer-wise Pruning of Transformer Heads for Efficient Language Modeling
☆22Feb 22, 2022Updated 4 years ago
Alternatives and similar repositories for Attention-Head-Pruning
Users that are interested in Attention-Head-Pruning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Nov 4, 2022Updated 3 years ago
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆18Dec 6, 2023Updated 2 years ago
- This repo investigates LLMs' tendency to exhibit acquiescence bias in sequential QA interactions. Includes evaluation methods, datasets, …☆17Apr 24, 2026Updated last month
- This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet mod…☆11Sep 30, 2021Updated 4 years ago
- Hardware-accelerated matrix/numeric programming library for Swift☆12Sep 2, 2025Updated 9 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆10Aug 18, 2022Updated 3 years ago
- ☆13Sep 24, 2023Updated 2 years ago
- PyTorch implementation of "Nextformer: A ConvNeXt Augmented Conformer For End-To-End Speech Recognition"☆10Dec 15, 2022Updated 3 years ago
- [ICLR 2023] PyTorch code for DFPC: Data flow driven pruning of coupled channels without data.☆15Aug 25, 2023Updated 2 years ago
- Naive spectrogram built with accelerate☆14Sep 15, 2018Updated 7 years ago
- INT-Q Extension of the CMSIS-NN library for ARM Cortex-M target☆18Jan 10, 2020Updated 6 years ago
- final-project-level3-nlp-02 created by GitHub Classroom☆11Dec 31, 2021Updated 4 years ago
- ☆48Aug 7, 2023Updated 2 years ago
- 3D Telecommunications project utilizing Holoportation technology to provide live volumetric capture. Used in one case to increase the re…☆22Apr 15, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆74Jul 8, 2025Updated 11 months ago
- Proof system for Fact Verification☆14Jun 7, 2022Updated 4 years ago
- ☆21Sep 2, 2020Updated 5 years ago
- 😎 Awesome papers on token redundancy reduction☆13Mar 12, 2025Updated last year
- Code for High-Capacity Expert Binary Networks (ICLR 2021).☆27Dec 3, 2021Updated 4 years ago
- IC implementation of TPU☆155Dec 18, 2019Updated 6 years ago
- [ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, …☆18Dec 30, 2021Updated 4 years ago
- Code for Analyzing Redundancy in Pretrained Transformer Models accepted at EMNLP 2020☆14Oct 6, 2020Updated 5 years ago
- Codes for "NAST: A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer" (ACL 2021 findings)☆15Nov 3, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆43Aug 14, 2024Updated last year
- ☆10Nov 22, 2022Updated 3 years ago
- An original package of the dynamic compressive gammachirp filterbank (dcGC-FB)☆14Oct 27, 2024Updated last year
- Code for "Structured Sparsity Inducing Adaptive Optimizers for Deep Learning" in PyTorch☆18Feb 11, 2021Updated 5 years ago
- VHDL Implementation☆15Oct 9, 2014Updated 11 years ago
- Code for RECENT☆13Dec 18, 2022Updated 3 years ago
- End-to-end neural table-text understanding models.☆10Nov 11, 2020Updated 5 years ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Dec 17, 2024Updated last year
- [CVPR2023] Practical Network Acceleration with Tiny Sets☆13Jul 28, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official code for the SIGIR 2025 accepted paper "CDC: Causal Domain Clustering for Multi-Domain Recommendation".☆15Aug 27, 2025Updated 9 months ago
- ✨ PyTorch implementation of "Cora: Correspondence-aware Image Editing Using Few-Step Diffusion", accepted at SIGGRAPH 2025.☆34Jun 3, 2025Updated last year
- Deploy mlflow models as JSON APIs with minimal new code☆21Apr 10, 2026Updated 2 months ago
- ☆17Feb 28, 2018Updated 8 years ago
- compare the theory attention gradient with PyTorch attention gradient☆16Apr 1, 2024Updated 2 years ago
- [ICML2024] "FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees" by Jiaha…☆14Sep 22, 2024Updated last year
- [ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆28Jan 27, 2026Updated 4 months ago