google-deepmind / recurrentgemmaLinks
Open weights language model from Google DeepMind, based on Griffin.
☆652Updated 3 months ago
Alternatives and similar repositories for recurrentgemma
Users that are interested in recurrentgemma are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆912Updated 5 months ago
- a small code base for training large models☆310Updated 5 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆321Updated 11 months ago
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆625Updated 6 months ago
- Annotated version of the Mamba paper☆490Updated last year
- Visualize the intermediate output of Mistral 7B☆371Updated 8 months ago
- A pure NumPy implementation of Mamba.☆223Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆667Updated this week
- ☆309Updated last year
- Fast bare-bones BPE for modern tokenizer training☆164Updated 3 months ago
- ☆281Updated last year
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆563Updated last year
- ☆196Updated last month
- The repository for the code of the UltraFastBERT paper☆519Updated last year
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,821Updated 3 months ago
- Reference implementation of Megalodon 7B model☆523Updated 4 months ago
- ☆537Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆576Updated last month
- For optimization algorithm research and development.☆539Updated last week
- Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".☆373Updated last year
- Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …☆213Updated 10 months ago
- Scalable and Performant Data Loading☆304Updated last week
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆536Updated last month
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆540Updated 4 months ago
- Normalized Transformer (nGPT)☆191Updated 10 months ago
- Felafax is building AI infra for non-NVIDIA GPUs☆567Updated 8 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆343Updated 9 months ago
- Understand and test language model architectures on synthetic tasks.☆226Updated last week
- An interactive HTML pretty-printer for machine learning research in IPython notebooks.☆445Updated last month
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆349Updated last year