ScalingIntelligence / tokasaurusLinks
☆407Updated this week
Alternatives and similar repositories for tokasaurus
Users that are interested in tokasaurus are comparing it to the libraries listed below
Sorting:
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆126Updated 4 months ago
- Storing long contexts in tiny caches with self-study☆145Updated this week
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆232Updated last week
- Simple & Scalable Pretraining for Neural Architecture Research☆289Updated last week
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆238Updated 3 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆319Updated 10 months ago
- ☆221Updated 5 months ago
- ☆217Updated 7 months ago
- LLM Inference on consumer devices☆124Updated 5 months ago
- An implementation of bucketMul LLM inference☆223Updated last year
- Super-fast Structured Outputs☆441Updated 2 weeks ago
- ☆220Updated 2 months ago
- rl from zero pretrain, can it be done? yes.☆261Updated last week
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- PyTorch Single Controller☆368Updated last week
- PyTorch implementation of models from the Zamba2 series.☆184Updated 7 months ago
- Pivotal Token Search☆121Updated last month
- SIMD quantization kernels☆83Updated last week
- Felafax is building AI infra for non-NVIDIA GPUs☆567Updated 7 months ago
- ☆197Updated 3 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆106Updated last week
- scalable and robust tree-based speculative decoding algorithm☆355Updated 7 months ago
- Decentralized RL Training at Scale☆472Updated this week
- a small code base for training large models☆309Updated 4 months ago
- A simple tool that let's you explore different possible paths that an LLM might sample.☆185Updated 3 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆245Updated 7 months ago
- Simple high-throughput inference library☆127Updated 3 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆223Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 8 months ago
- Samples of good AI generated CUDA kernels☆89Updated 3 months ago