dropbox / low-rank-llama2View external linksLinks
Low-Rank Llama Custom Training
☆23Mar 27, 2024Updated last year
Alternatives and similar repositories for low-rank-llama2
Users that are interested in low-rank-llama2 are comparing it to the libraries listed below
Sorting:
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆14Feb 4, 2025Updated last year
- ☆31Nov 11, 2024Updated last year
- ☆12May 22, 2022Updated 3 years ago
- Code Implementation for "NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models" (EMNLP …☆17Oct 17, 2023Updated 2 years ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆46Jun 4, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models☆28Aug 5, 2025Updated 6 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆24Jun 26, 2024Updated last year
- Implementation of PGONAS for CVPR22W and RD-NAS for ICASSP23☆23Apr 25, 2023Updated 2 years ago
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 3 months ago
- ☆63Oct 17, 2023Updated 2 years ago
- Official PyTorch implementation of "Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets" (ICLR 2023 notable top 25%)☆26Mar 18, 2024Updated last year
- [ICLR 2021] CompOFA: Compound Once-For-All Networks For Faster Multi-Platform Deployment☆25Jan 5, 2023Updated 3 years ago
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆34Mar 6, 2025Updated 11 months ago
- ☆30Jul 22, 2024Updated last year
- Official PyTorch implementation of "DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation" (CVPR 2023)☆29Apr 1, 2024Updated last year
- [ICLR 2022] "Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, and No Retraining" by Lu Miao*, Xiaolong Luo*, T…☆33Jan 20, 2022Updated 4 years ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆34Jan 9, 2024Updated 2 years ago
- ☆12Jan 10, 2026Updated last month
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆82Jul 7, 2025Updated 7 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆39Oct 17, 2023Updated 2 years ago
- NAS Benchmark in "Prioritized Architecture Sampling with Monto-Carlo Tree Search", CVPR2021☆37Aug 24, 2021Updated 4 years ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆228Jan 11, 2025Updated last year
- ☆235Jun 11, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆172Nov 26, 2025Updated 2 months ago
- Generic Neural Architecture Search via Regression (NeurIPS'21 Spotlight)☆36Aug 29, 2022Updated 3 years ago
- Parses Facebook chat messages into Python objects to enable convenient analysis.☆11Jan 3, 2018Updated 8 years ago
- TSDG: An efficient index graph for graph-based nearest neighbor search☆10Jul 14, 2022Updated 3 years ago
- A CLI tool to help you easily delete forked repositories.☆10Updated this week
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated last year
- 基于FISCO-BCOS区块链的供应链demo,使用node.js构建后端☆10Jan 28, 2021Updated 5 years ago
- rabitq rust implementation☆10Feb 4, 2026Updated last week
- Pluralsight Reporting API - Python Client☆11Apr 13, 2017Updated 8 years ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆49Oct 29, 2025Updated 3 months ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆180Oct 3, 2024Updated last year
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆103Mar 12, 2024Updated last year
- Fastai+PyTorch implementation of sparse model training methods (SET, SNFS, RigL) + customize-your-own.☆10Oct 20, 2022Updated 3 years ago
- Common template for pytorch project. Easy to extent and modify for new project.☆13Dec 13, 2022Updated 3 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Jan 26, 2021Updated 5 years ago