ScalingIntelligence / tokasaurusLinks
☆462Updated last month
Alternatives and similar repositories for tokasaurus
Users that are interested in tokasaurus are comparing it to the libraries listed below
Sorting:
- Storing long contexts in tiny caches with self-study☆231Updated last month
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Updated 9 months ago
- ☆218Updated 11 months ago
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆259Updated 7 months ago
- MoE training for Me and You and maybe other people☆319Updated 2 weeks ago
- ☆253Updated 10 months ago
- ☆237Updated 2 weeks ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆345Updated last year
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆330Updated 4 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆306Updated last month
- rl from zero pretrain, can it be done? yes.☆286Updated 3 months ago
- LLM Inference on consumer devices☆128Updated 10 months ago
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆246Updated last week
- Super basic implementation (gist-like) of RLMs with REPL environments.☆435Updated 2 weeks ago
- Pivotal Token Search☆142Updated last month
- Long context evaluation for large language models☆225Updated 10 months ago
- ☆214Updated this week
- Train your own SOTA deductive reasoning model☆107Updated 10 months ago
- ☆250Updated last year
- PyTorch implementation of models from the Zamba2 series.☆186Updated 11 months ago
- Simple high-throughput inference library☆155Updated 8 months ago
- Async RL Training at Scale☆1,005Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆250Updated 11 months ago
- scalable and robust tree-based speculative decoding algorithm☆366Updated 11 months ago
- An implementation of bucketMul LLM inference☆223Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆318Updated last week
- Samples of good AI generated CUDA kernels☆99Updated 7 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆101Updated 6 months ago
- A character-level language diffusion model trained on Tiny Shakespeare☆830Updated this week