[DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
β89Jun 30, 2024Updated 2 years ago
Alternatives and similar repositories for Edge-LLM
Users that are interested in Edge-LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An open-sourced PyTorch library for developing energy efficient multiplication-less models and applications.β14Feb 3, 2025Updated last year
- [NAACL'25 π SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expertβ¦β16Feb 4, 2025Updated last year
- [ICML 2022] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networksβ15May 18, 2022Updated 4 years ago
- The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"β10Mar 22, 2023Updated 3 years ago
- β29Feb 26, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- β122Nov 17, 2023Updated 2 years ago
- β12May 18, 2024Updated 2 years ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruningβ20Jul 7, 2022Updated 3 years ago
- [NeurIPS 2020] ShiftAddNet: A Hardware-Inspired Deep Networkβ74Nov 16, 2020Updated 5 years ago
- Offcial code for the ECCV2024 paper "Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities"β26Oct 1, 2024Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterizationβ114Oct 15, 2024Updated last year
- β10Jun 28, 2019Updated 7 years ago
- Lab assignments for the Agile Hardware Design courseβ19Nov 14, 2025Updated 7 months ago
- β258Oct 24, 2025Updated 8 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- β59Jun 10, 2024Updated 2 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Chβ¦β29Jul 15, 2025Updated 11 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundryβ43Jan 15, 2024Updated 2 years ago
- Code Implementation for "NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models" (EMNLP β¦β17Oct 17, 2023Updated 2 years ago
- β35Dec 22, 2025Updated 6 months ago
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformerβ30Dec 6, 2023Updated 2 years ago
- [NeurIPS 2020] "FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training" by Yonggan Fu, Haβ¦β10Feb 13, 2022Updated 4 years ago
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruningβ135Aug 27, 2024Updated last year
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β81Jul 7, 2025Updated 11 months ago
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- A family of efficient edge language models in 100M~1B sizes.β19Feb 14, 2025Updated last year
- A co-design architecture on sparse attentionβ55Aug 23, 2021Updated 4 years ago
- This repo contains the code for studying the interplay between quantization and sparsity methodsβ26Feb 26, 2025Updated last year
- [ICLRβ24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β107Jun 20, 2025Updated last year
- [HPCA 2022] GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Designβ38Mar 30, 2022Updated 4 years ago
- Adaptive floating-point based numerical format for resilient deep learningβ14Apr 11, 2022Updated 4 years ago
- [ICML 2021] "Double-Win Quant: Aggressively Winning Robustness of Quantized DeepNeural Networks via Random Precision Training and Inferenβ¦β16Feb 13, 2022Updated 4 years ago
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"β13Mar 11, 2026Updated 3 months ago
- RTL implementation of Flex-DPE.β117Feb 22, 2020Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMsβ100Nov 25, 2024Updated last year
- C++ RTL simulator for EIE(https://arxiv.org/abs/1602.01528)β25Mar 17, 2021Updated 5 years ago
- PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applicationsβ45May 5, 2023Updated 3 years ago
- [ICML 2021] "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators" by Yonggan Fu, Yongaβ¦β16Jan 3, 2022Updated 4 years ago
- The Reconfigurable Solver for QPβ11Apr 19, 2023Updated 3 years ago
- ECE 5745 Tutorial 8: SRAM Generatorsβ16Mar 5, 2022Updated 4 years ago
- β16Mar 18, 2025Updated last year