thepowerfuldeez / sample_efficient_gptLinks
Training framework with a goal to explore the frontier of sample efficiency of small language models
β81Updated last week
Alternatives and similar repositories for sample_efficient_gpt
Users that are interested in sample_efficient_gpt are comparing it to the libraries listed below
Sorting:
- NanoGPT-speedrunning for the poor T4 enjoyersβ73Updated 7 months ago
- πSmall Batch Size Training for Language Modelsβ68Updated 2 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languagβ¦β120Updated 2 months ago
- Simple repository for training small reasoning modelsβ47Updated 10 months ago
- β58Updated 3 weeks ago
- Collection of autoregressive model implementationβ85Updated 7 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"β85Updated 3 months ago
- β28Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)β108Updated 9 months ago
- LLM training in simple, raw C/CUDAβ15Updated last year
- Simple GRPO scripts and configurations.β59Updated 10 months ago
- β89Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesβ148Updated 2 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrunβ57Updated 9 months ago
- β32Updated last year
- β91Updated last year
- β82Updated last year
- β53Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Modeβ¦β62Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β173Updated 11 months ago
- Jax like function transformation engine but micro, microjaxβ34Updated last year
- DeMo: Decoupled Momentum Optimizationβ197Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs thinkβ70Updated this week
- H-Net Dynamic Hierarchical Architectureβ80Updated 3 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- Compiling useful links, papers, benchmarks, ideas, etc.β45Updated 8 months ago
- rl from zero pretrain, can it be done? yes.β282Updated 2 months ago
- Supporting code for the blog post on modular manifolds.β105Updated 2 months ago
- Normalized Transformer (nGPT)β193Updated last year
- β26Updated 11 months ago