mobiusml / low-rank-llama2Links

Low-Rank Llama Custom Training

☆23

Alternatives and similar repositories for low-rank-llama2

Users that are interested in low-rank-llama2 are comparing it to the libraries listed below

Sorting:

feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆77Updated last year
ScalingIntelligence / CATS
☆28Updated 11 months ago
FasterDecoding / TEAL
☆143Updated 7 months ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
Dao-AILab / grouped-latent-attention
☆129Updated 4 months ago
fmfi-compbio / admm-pruning
☆30Updated last year
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆49Updated 11 months ago
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆43Updated 3 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆168Updated last year
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 10 months ago
TianjinYellow / StableSPAM
☆24Updated 6 months ago
Qualcomm-AI-research / gptvq
☆35Updated last year
SqueezeAILab / SqueezedAttention
SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference
☆54Updated 10 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆82Updated last year
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆105Updated 6 months ago
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆58Updated 3 months ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆155Updated last year
haochengxi / Train_Transformers_with_INT4
☆156Updated 2 years ago
facebookresearch / Ternary_Binary_Transformer
ACL 2023
☆39Updated 2 years ago
machilusZ / FastGen
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆40Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated 2 years ago
IST-DASLab / MicroAdam
This repository contains code for the MicroAdam paper.
☆19Updated 9 months ago
hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆80Updated 11 months ago
epfml / dynamic-sparse-flash-attention
☆148Updated 2 years ago
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆131Updated 2 years ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆119Updated 3 months ago
Infini-AI-Lab / gsm_infinite
☆54Updated 4 months ago
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆80Updated 3 months ago
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆46Updated last year