JL-er / DiSHA

☆13

Alternatives and similar repositories for DiSHA

Users that are interested in DiSHA are comparing it to the libraries listed below

Sorting:

OpenMOSE / RWKV-Infer
A large-scale RWKV v6, v7(World, ARWKV, PRWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy o…
☆35Updated last week
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆30Updated 2 weeks ago
SmerkyG / RWKV_Explained
RWKV, in easy to read code
☆72Updated last month
yynil / RWKVInside
☆34Updated 2 weeks ago
hamishivi / tess-2
Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"
☆34Updated 2 months ago
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated last week
AGENDD / RWKV-ASR
This repo is an exploratory experiment to enable frozen pretrained RWKV language models to accept speech modality input. We followed the …
☆49Updated 4 months ago
IST-DASLab / peft-rosa
A fork of the PEFT library, supporting Robust Adaptation (RoSA)
☆14Updated 8 months ago
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆32Updated 9 months ago
Jellyfish042 / uncheatable_eval
Evaluating LLMs with Dynamic Data
☆87Updated 3 weeks ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆42Updated 5 months ago
SmerkyG / gptcore
Fast modular code to create and train cutting edge LLMs
☆66Updated 11 months ago
yynil / RWKVinLLAMA
☆18Updated 4 months ago
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆47Updated 3 weeks ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆60Updated last year
lucidrains / light-recurrent-unit-pytorch
Implementation of a Light Recurrent Unit in Pytorch
☆46Updated 7 months ago
JL-er / RWKV-PEFT
☆121Updated this week
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆103Updated last week
Jellyfish042 / RWKV-StateTuning
State tuning tunes the state
☆32Updated 3 months ago
Zyphra / Zyda_processing
☆33Updated 10 months ago
BBuf / RWKV-World-HF-Tokenizer
☆34Updated 9 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆84Updated 5 months ago
kyleliang919 / Online-Subspace-Descent
This repo is based on https://github.com/jiaweizzhao/GaLore
☆27Updated 7 months ago
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆56Updated this week
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated last year
00ffcc / chunkRWKV6
continous batching and parallel acceleration for RWKV6
☆24Updated 10 months ago
wdlctc / mini-s
☆51Updated 6 months ago
JL-er / WorldRWKV
The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…
☆45Updated last month
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆50Updated 2 months ago
princeton-pli / MeCo
Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"
☆38Updated last week