google-deepmind / recurrentgemmaLinks

Open weights language model from Google DeepMind, based on Griffin.

☆652

Alternatives and similar repositories for recurrentgemma

Users that are interested in recurrentgemma are comparing it to the libraries listed below

Sorting:

Cerebras / gigaGPT
a small code base for training large models
☆309Updated 5 months ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆489Updated last year
microsoft / Samba
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
☆915Updated 5 months ago
PaulPauls / llama3_interpretability_sae
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…
☆625Updated 7 months ago
valine / NeuralFlow
Visualize the intermediate output of Mistral 7B
☆375Updated 9 months ago
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆560Updated 9 months ago
marin-community / levanter
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆671Updated this week
SakanaAI / evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆325Updated last year
apple / ml-sigma-reparam
☆309Updated last year
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆542Updated 5 months ago
tysam-code / hlb-gpt
Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…
☆350Updated last year
google-deepmind / nanodo
☆283Updated last year
facebookresearch / spdl
Scalable and Performant Data Loading
☆311Updated this week
facebookresearch / optimizers
For optimization algorithm research and development.
☆542Updated last week
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆317Updated this week
llm-random / llm-random
☆200Updated last month
facebookresearch / searchformer
Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".
☆374Updated last year
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆342Updated 10 months ago
gautierdag / bpeasy
Fast bare-bones BPE for modern tokenizer training
☆166Updated 4 months ago
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆518Updated last year
XuezheMax / megalodon
Reference implementation of Megalodon 7B model
☆522Updated 5 months ago
idoh / mamba.np
A pure NumPy implementation of Mamba.
☆223Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆578Updated 2 months ago
ironjr / grokfast
Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
☆562Updated last year
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆405Updated 11 months ago
adamkarvonen / chess_llm_interpretability
Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …
☆216Updated 11 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆500Updated last week
NousResearch / DisTrO
Distributed Training Over-The-Internet
☆961Updated last week