BlinkDL / modded-nanogpt-rwkvLinks

RWKV-7: Surpassing GPT

☆98

Alternatives and similar repositories for modded-nanogpt-rwkv

Users that are interested in modded-nanogpt-rwkv are comparing it to the libraries listed below

Sorting:

RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 5 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
kyleliang919 / Super_Muon
☆64Updated 7 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆59Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆45Updated last week
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
SmerkyG / gptcore
Fast modular code to create and train cutting edge LLMs
☆68Updated last year
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆248Updated 8 months ago
wdlctc / mini-s
☆52Updated 11 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆110Updated 6 months ago
IST-DASLab / QuEST
Work in progress.
☆74Updated 3 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆99Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
yynil / RWKVInside
☆38Updated 5 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆92Updated 5 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated 2 weeks ago
QuixiAI / grokadamw
☆136Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 6 months ago
SmerkyG / RWKV_Explained
RWKV, in easy to read code
☆72Updated 6 months ago
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆193Updated last year