google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆607Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for recurrentgemma
- Annotated version of the Mamba paper☆457Updated 8 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆803Updated 3 months ago
- ☆197Updated 4 months ago
- a small code base for training large models☆266Updated 2 weeks ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- Visualize the intermediate output of Mistral 7B☆313Updated 9 months ago
- ☆292Updated 4 months ago
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,679Updated this week
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆537Updated 6 months ago
- Tile primitives for speedy kernels☆1,658Updated this week
- NanoGPT (124M) quality in 7.8 8xH100-minutes☆1,033Updated this week
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆515Updated 4 months ago
- A Jax-based library for designing and training transformer models from scratch.☆276Updated 2 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆334Updated 3 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆516Updated this week
- Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".☆322Updated 5 months ago
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- [ICML 2024] CLLMs: Consistency Large Language Models☆353Updated this week
- A pure NumPy implementation of Mamba.☆216Updated 4 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆626Updated 7 months ago
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆457Updated last week
- ☆234Updated 8 months ago
- Fast bare-bones BPE for modern tokenizer training☆142Updated last month
- Schedule-Free Optimization in PyTorch☆1,898Updated 2 weeks ago
- Some preliminary explorations of Mamba's context scaling.☆191Updated 9 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆476Updated 3 weeks ago
- Minimalistic large language model 3D-parallelism training☆1,260Updated this week
- The repository for the code of the UltraFastBERT paper☆514Updated 7 months ago
- LLM Analytics☆615Updated last month