Simple Transformer in Jax
☆143Jun 22, 2024Updated last year
Alternatives and similar repositories for simple_transformer
Users that are interested in simple_transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆40Jul 26, 2024Updated last year
- Entropy Based Sampling and Parallel CoT Decoding☆3,431Nov 13, 2024Updated last year
- Training code for Sparse Autoencoders on Embedding models☆39Apr 25, 2026Updated last week
- Jax like function transformation engine but micro, microjax☆34Oct 25, 2024Updated last year
- smol models are fun too☆93Nov 9, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Frechet inception distance (FID) evaluation in JAX☆14May 28, 2024Updated last year
- smolLM with Entropix sampler on pytorch☆149Oct 31, 2024Updated last year
- A graph visualization of attention☆56May 20, 2025Updated 11 months ago
- ☆14Apr 16, 2025Updated last year
- Small autodiff lib and a simple working feedforward neural net in Haskell on top of it, from scratch, zero-deps.☆16Jun 21, 2024Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 11 months ago
- Sparsify transformers with SAEs and transcoders☆714Apr 27, 2026Updated last week
- ☆308Jul 15, 2024Updated last year
- Training Models Daily☆16Dec 19, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- gzip Predicts Data-dependent Scaling Laws☆35May 28, 2024Updated last year
- High Quality Resources on GPU Programming/Architecture☆591Jul 26, 2024Updated last year
- DeMo: Decoupled Momentum Optimization☆201Dec 2, 2024Updated last year
- An introduction to LLM Sampling☆80Dec 15, 2024Updated last year
- Knowledge base Claude application☆43Jan 3, 2026Updated 4 months ago
- Efficient Scaling laws and collaborative pretraining.☆22Sep 18, 2025Updated 7 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Apr 22, 2025Updated last year
- Automatically annotates YOLO dataset using Moondream visual model☆19Aug 24, 2025Updated 8 months ago
- Reasoning Computers. Lambda Calculus, Fully Differentiable. Also Neural Stacks, Queues, Arrays, Lists, Trees, and Latches.☆287Nov 3, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A light tensor library in zig.☆77Feb 9, 2025Updated last year
- Build your own visual reasoning model☆422Jan 13, 2026Updated 3 months ago
- ☆27Jul 9, 2024Updated last year
- It's a baby compiler. (Lean btw.)☆16May 19, 2025Updated 11 months ago
- Our library for RL environments + evals☆4,057Apr 30, 2026Updated last week
- utilities for batched llm calls with retries☆50Apr 23, 2026Updated last week
- look how they massacred my boy☆63Oct 16, 2024Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- ☆33Nov 4, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An automated tool for discovering insights from research papaer corpora☆137Jun 8, 2024Updated last year
- ☆93Jul 5, 2024Updated last year
- ☆12Jun 2, 2023Updated 2 years ago
- NanoGPT (124M) in 90 seconds☆5,200Updated this week
- Smart reproducible analytical pipeline inspection☆21Feb 13, 2026Updated 2 months ago
- ☆22Nov 9, 2024Updated last year
- hakken is a coding agent which needs hell lot of context☆31Dec 4, 2025Updated 5 months ago