google-deepmind / recurrentgemmaLinks
Open weights language model from Google DeepMind, based on Griffin.
☆661Updated last week
Alternatives and similar repositories for recurrentgemma
Users that are interested in recurrentgemma are comparing it to the libraries listed below
Sorting:
- a small code base for training large models☆320Updated 9 months ago
- Annotated version of the Mamba paper☆495Updated last year
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆943Updated 2 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆355Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆562Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆694Updated this week
- ☆207Updated 2 weeks ago
- Visualize the intermediate output of Mistral 7B☆383Updated last year
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆347Updated last year
- Reference implementation of Megalodon 7B model☆529Updated 8 months ago
- Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".☆375Updated last year
- ☆314Updated last year
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆549Updated 8 months ago
- ☆289Updated last year
- Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …☆218Updated last year
- The repository for the code of the UltraFastBERT paper☆519Updated last year
- Fast bare-bones BPE for modern tokenizer training☆174Updated 7 months ago
- A pure NumPy implementation of Mamba.☆222Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆370Updated last year
- For optimization algorithm research and development.☆558Updated 3 weeks ago
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,855Updated 7 months ago
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆628Updated 10 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆596Updated 5 months ago
- ☆558Updated last year
- Inference code for Persimmon-8B☆412Updated 2 years ago
- Understand and test language model architectures on synthetic tasks.☆251Updated 3 weeks ago
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆334Updated 3 weeks ago
- Long context evaluation for large language models☆225Updated 10 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆411Updated last year
- PyTorch implementation of models from the Zamba2 series.☆186Updated last year