google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆636Updated 2 months ago
Alternatives and similar repositories for recurrentgemma:
Users that are interested in recurrentgemma are comparing it to the libraries listed below
- a small code base for training large models☆294Updated last week
- Annotated version of the Mamba paper☆483Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆536Updated last week
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆867Updated this week
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆569Updated this week
- For optimization algorithm research and development.☆509Updated this week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆305Updated 6 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆511Updated 6 months ago
- ☆301Updated 10 months ago
- Puzzles for exploring transformers☆344Updated 2 years ago
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆607Updated last month
- ☆217Updated 9 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆549Updated 4 months ago
- ☆446Updated 9 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆345Updated 9 months ago
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,769Updated last week
- ☆186Updated this week
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆491Updated last week
- [ICML 2024] CLLMs: Consistency Large Language Models☆390Updated 5 months ago
- Scalable and Performant Data Loading☆247Updated this week
- Muon optimizer: +>30% sample efficiency with <3% wallclock overhead☆597Updated last month
- What would you do with 1000 H100s...☆1,043Updated last year
- A pure NumPy implementation of Mamba.☆222Updated 9 months ago
- Understand and test language model architectures on synthetic tasks.☆194Updated last month
- Helpful tools and examples for working with flex-attention☆746Updated 3 weeks ago
- ☆241Updated last year
- Best practices & guides on how to write distributed pytorch training code☆406Updated 2 months ago
- A Jax-based library for designing and training small transformers.☆286Updated 8 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆232Updated 2 months ago
- Large Context Attention☆707Updated 3 months ago