BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆42Updated this week
Related projects ⓘ
Alternatives and complementary repositories for modded-nanogpt-rwkv
- GoldFinch and other hybrid transformer components☆39Updated 3 months ago
- ☆49Updated 7 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 2 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆59Updated 6 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 6 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆105Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆83Updated last week
- ☆62Updated last month
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆58Updated 3 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- Collection of autoregressive model implementation☆66Updated last week
- RWKV in nanoGPT style☆177Updated 5 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Let us make Psychohistory (as in Asimov) a reality, and accessible to everyone. Useful for LLM grounding and games / fiction / business /…☆40Updated last year
- A repository for research on medium sized language models.☆74Updated 5 months ago
- Fast modular code to create and train cutting edge LLMs☆65Updated 5 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆38Updated 2 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆69Updated this week
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 5 months ago
- Implementation of the Mamba SSM with hf_integration.☆55Updated 2 months ago
- ☆53Updated 9 months ago
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆46Updated last week
- Token Omission Via Attention☆119Updated 3 weeks ago
- ☆76Updated 6 months ago
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- ☆61Updated 2 months ago
- ☆72Updated 4 months ago
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 7 months ago