ScalingIntelligence / tokasaurusLinks
☆466Updated 2 months ago
Alternatives and similar repositories for tokasaurus
Users that are interested in tokasaurus are comparing it to the libraries listed below
Sorting:
- Storing long contexts in tiny caches with self-study☆236Updated 2 months ago
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Updated 9 months ago
- ☆219Updated last year
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆262Updated 8 months ago
- MoE training for Me and You and maybe other people☆335Updated last month
- Simple & Scalable Pretraining for Neural Architecture Research☆307Updated 2 months ago
- ☆237Updated last month
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆273Updated this week
- Simple high-throughput inference library☆155Updated 8 months ago
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆417Updated last month
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆259Updated last month
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆344Updated 5 months ago
- scalable and robust tree-based speculative decoding algorithm☆366Updated last year
- Train your own SOTA deductive reasoning model☆107Updated 11 months ago
- ☆258Updated 11 months ago
- LLM Inference on consumer devices☆129Updated 10 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 4 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆347Updated last year
- PyTorch implementation of models from the Zamba2 series.☆186Updated last year
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆902Updated last week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆252Updated last year
- code for training & evaluating Contextual Document Embedding models☆202Updated 8 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆592Updated last month
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆334Updated 3 months ago
- A character-level language diffusion model trained on Tiny Shakespeare☆849Updated 3 weeks ago
- An implementation of bucketMul LLM inference☆224Updated last year
- Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings☆114Updated 6 months ago
- rl from zero pretrain, can it be done? yes.☆286Updated 4 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆475Updated this week