kjslag / spacebyteLinks

A byte-level decoder architecture that matches the performance of tokenized Transformers.

☆64

Alternatives and similar repositories for spacebyte

Users that are interested in spacebyte are comparing it to the libraries listed below

Sorting:

lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆127Updated 11 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆88Updated last year
RobertCsordas / moeut
☆83Updated 11 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 10 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆65Updated this week
epfml / DenseFormer
☆81Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆103Updated 3 months ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated last month
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆160Updated 3 months ago
wdlctc / mini-s
☆51Updated 9 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated last month
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 7 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆184Updated 6 months ago
Zyphra / Zyda_processing
☆37Updated last year
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 2 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated 11 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated 9 months ago
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆111Updated 7 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆43Updated 7 months ago
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆55Updated 4 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆112Updated 5 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆54Updated last year
vmarinowski / infini-attention
An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'
☆52Updated 11 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆46Updated last year
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated last year
huyphan168 / PEER
Mixture of A Million Experts
☆46Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆100Updated last week