google-deepmind / recurrentgemmaLinks
Open weights language model from Google DeepMind, based on Griffin.
☆639Updated last week
Alternatives and similar repositories for recurrentgemma
Users that are interested in recurrentgemma are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆876Updated last month
- a small code base for training large models☆299Updated last month
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆584Updated this week
- Annotated version of the Mamba paper☆482Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆544Updated this week
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆498Updated last week
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆513Updated 2 weeks ago
- ☆267Updated 10 months ago
- Muon optimizer: +>30% sample efficiency with <3% wallclock overhead☆661Updated this week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆309Updated 7 months ago
- ☆309Updated last week
- Large Context Attention☆711Updated 4 months ago
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆614Updated 2 months ago
- ☆190Updated this week
- A pure NumPy implementation of Mamba.☆223Updated 10 months ago
- Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".☆367Updated 11 months ago
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,780Updated last month
- [ICML 2024] CLLMs: Consistency Large Language Models☆391Updated 6 months ago
- ☆474Updated 10 months ago
- PyTorch implementation of models from the Zamba2 series.☆181Updated 4 months ago
- Code repository for Black Mamba☆246Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆333Updated 5 months ago
- [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆559Updated 3 months ago
- A simple, performant and scalable Jax LLM!☆1,734Updated this week
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆345Updated 10 months ago
- Language Modeling with the H3 State Space Model☆518Updated last year
- Helpful tools and examples for working with flex-attention☆802Updated last week
- The repository for the code of the UltraFastBERT paper☆514Updated last year
- Inference code for Persimmon-8B☆415Updated last year
- For optimization algorithm research and development.☆518Updated this week