google-deepmind / recurrentgemmaLinks
Open weights language model from Google DeepMind, based on Griffin.
☆641Updated 2 weeks ago
Alternatives and similar repositories for recurrentgemma
Users that are interested in recurrentgemma are comparing it to the libraries listed below
Sorting:
- a small code base for training large models☆301Updated last month
- Annotated version of the Mamba paper☆485Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆235Updated 2 weeks ago
- ☆270Updated 11 months ago
- ☆190Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆556Updated this week
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆615Updated 2 months ago
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆881Updated last month
- Visualize the intermediate output of Mistral 7B☆367Updated 5 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆554Updated 5 months ago
- PyTorch implementation of models from the Zamba2 series.☆182Updated 4 months ago
- For optimization algorithm research and development.☆521Updated this week
- Understand and test language model architectures on synthetic tasks.☆217Updated 2 weeks ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆394Updated 7 months ago
- A repository for research on medium sized language models.☆498Updated 2 weeks ago
- Code repository for Black Mamba☆246Updated last year
- Fast bare-bones BPE for modern tokenizer training☆159Updated 2 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆519Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆239Updated 4 months ago
- Best practices & guides on how to write distributed pytorch training code☆441Updated 3 months ago
- Inference code for Persimmon-8B☆415Updated last year
- ☆303Updated last year
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆284Updated 2 weeks ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆311Updated 8 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆595Updated this week
- ☆286Updated last month
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆510Updated last week
- Large Context Attention☆716Updated 4 months ago
- Puzzles for exploring transformers☆349Updated 2 years ago
- Normalized Transformer (nGPT)☆183Updated 7 months ago