Simple Transformer in Jax
☆144Jun 22, 2024Updated last year
Alternatives and similar repositories for simple_transformer
Users that are interested in simple_transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆40Jul 26, 2024Updated last year
- Entropy Based Sampling and Parallel CoT Decoding☆3,435Nov 13, 2024Updated last year
- Training code for Sparse Autoencoders on Embedding models☆39May 9, 2026Updated last month
- Jax like function transformation engine but micro, microjax☆34Oct 25, 2024Updated last year
- smol models are fun too☆94Nov 9, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Frechet inception distance (FID) evaluation in JAX☆14May 28, 2024Updated 2 years ago
- smolLM with Entropix sampler on pytorch☆149Oct 31, 2024Updated last year
- A graph visualization of attention☆56May 20, 2025Updated last year
- ☆14Apr 16, 2025Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated last year
- Sparsify transformers with SAEs and transcoders☆725Jun 8, 2026Updated last week
- Training Models Daily☆16Dec 19, 2023Updated 2 years ago
- ☆306Jul 15, 2024Updated last year
- gzip Predicts Data-dependent Scaling Laws☆35May 28, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- High Quality Resources on GPU Programming/Architecture☆592Jul 26, 2024Updated last year
- DeMo: Decoupled Momentum Optimization☆201Dec 2, 2024Updated last year
- An introduction to LLM Sampling☆80Dec 15, 2024Updated last year
- Knowledge base Claude application☆43Jan 3, 2026Updated 5 months ago
- Efficient Scaling laws and collaborative pretraining.☆22Sep 18, 2025Updated 8 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆74Apr 22, 2025Updated last year
- Automatically annotates YOLO dataset using Moondream visual model☆21Aug 24, 2025Updated 9 months ago
- Reasoning Computers. Lambda Calculus, Fully Differentiable. Also Neural Stacks, Queues, Arrays, Lists, Trees, and Latches.☆288Nov 3, 2024Updated last year
- Build your own visual reasoning model☆423Jan 13, 2026Updated 5 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- No frills LLM-assisted programming☆251Jul 24, 2024Updated last year
- ☆27Jul 9, 2024Updated last year
- utilities for batched llm calls with retries☆51Updated this week
- Our library for RL environments + evals☆4,187Updated this week
- Minimal implementation of scalable rectified flow transformers, based on SD3's approach☆635Jul 1, 2024Updated last year
- look how they massacred my boy☆63Oct 16, 2024Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated 2 years ago
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year
- An automated tool for discovering insights from research papaer corpora☆137Jun 8, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆93Jul 5, 2024Updated last year
- ☆12Jun 2, 2023Updated 3 years ago
- NanoGPT (124M) in 90 seconds☆5,385Updated this week
- Smart reproducible analytical pipeline inspection☆21Feb 13, 2026Updated 4 months ago
- ☆22Nov 9, 2024Updated last year
- hakken is a coding agent which needs hell lot of context☆31Dec 4, 2025Updated 6 months ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Jan 16, 2023Updated 3 years ago