JL-er / MiSSLinks

MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.

☆23

Alternatives and similar repositories for MiSS

Users that are interested in MiSS are comparing it to the libraries listed below

Sorting:

howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆47Updated 2 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
yynil / RWKVInside
☆38Updated 5 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆96Updated 10 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆44Updated 10 months ago
recursal / RADLADS-paper
RADLADS training code
☆28Updated 5 months ago
kyleliang919 / Super_Muon
☆64Updated 6 months ago
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 7 months ago
SalesforceAIResearch / GemFilter
☆86Updated 8 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆56Updated 6 months ago
SmerkyG / gptcore
Fast modular code to create and train cutting edge LLMs
☆68Updated last year
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆31Updated 3 months ago
hamishivi / tess-2
Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"
☆50Updated 7 months ago
imoneoi / bf16_fused_adam
BFloat16 Fused Adam Operator for PyTorch
☆15Updated 10 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆155Updated 6 months ago
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆44Updated 3 weeks ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated 11 months ago
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆89Updated 2 months ago
wdlctc / mini-s
☆52Updated 11 months ago
Infini-AI-Lab / S2FT
☆19Updated 9 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆161Updated 5 months ago
RobertCsordas / moeut
☆85Updated last year
erogol / BlaGPT
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…
☆81Updated last month
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 5 months ago
IST-DASLab / peft-rosa
A fork of the PEFT library, supporting Robust Adaptation (RoSA)
☆15Updated last year
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆70Updated 7 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year