google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆606Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for recurrentgemma
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆479Updated 2 weeks ago
- a small code base for training large models☆264Updated last week
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆474Updated 2 weeks ago
- Annotated version of the Mamba paper☆455Updated 8 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆350Updated 3 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆801Updated 2 months ago
- ☆292Updated 4 months ago
- Minimalistic large language model 3D-parallelism training☆1,227Updated this week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆609Updated 7 months ago
- NanoGPT (124M) quality in 8.2 minutes☆946Updated this week
- Helpful tools and examples for working with flex-attention☆460Updated 2 weeks ago
- ☆197Updated 3 months ago
- Best practices & guides on how to write distributed pytorch training code☆278Updated this week
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆536Updated 5 months ago
- Transformers with Arbitrarily Large Context☆637Updated 2 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆332Updated 3 months ago
- ☆223Updated 3 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆698Updated last week
- Fast bare-bones BPE for modern tokenizer training☆142Updated 2 weeks ago
- Puzzles for learning Triton☆1,068Updated last month
- For optimization algorithm research and development.☆408Updated this week
- PyTorch implementation of models from the Zamba2 series.☆158Updated 2 months ago
- Tile primitives for speedy kernels☆1,629Updated this week
- Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".☆313Updated 4 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆770Updated 3 months ago
- ☆448Updated 7 months ago
- Long context evaluation for large language models☆185Updated this week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆261Updated last year