sebulo / LoQTLinks

☆79

Alternatives and similar repositories for LoQT

Users that are interested in LoQT are comparing it to the libraries listed below

Sorting:

BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆92Updated 7 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆77Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆103Updated 2 months ago
BorealisAI / flora-opt
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
☆104Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆128Updated 7 months ago
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆30Updated 2 weeks ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆81Updated last month
kyleliang919 / Super_Muon
☆59Updated 3 months ago
minyoungg / LTE
☆68Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆243Updated 5 months ago
RobertCsordas / moeut
☆82Updated 10 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 9 months ago
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated 11 months ago
wdlctc / mini-s
☆51Updated 8 months ago
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆47Updated 2 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated 9 months ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆144Updated 9 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆111Updated 5 months ago
IST-DASLab / QuEST
Work in progress.
☆70Updated 2 weeks ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
cognitivecomputations / grokadamw
☆134Updated 10 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆183Updated 5 months ago
SalesforceAIResearch / GemFilter
☆80Updated 6 months ago
snu-mllab / KVzip
Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆91Updated last week
Cornell-RelaxML / qtip
☆139Updated 3 weeks ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆158Updated 3 months ago
vmarinowski / infini-attention
An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'
☆52Updated 10 months ago
LLM360 / k2-train
☆50Updated last year