ScalingIntelligence / tokasaurusLinks
☆449Updated this week
Alternatives and similar repositories for tokasaurus
Users that are interested in tokasaurus are comparing it to the libraries listed below
Sorting:
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆124Updated 6 months ago
- Storing long contexts in tiny caches with self-study☆205Updated last week
- Simple & Scalable Pretraining for Neural Architecture Research☆297Updated 2 months ago
- ☆218Updated 9 months ago
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆249Updated 5 months ago
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆286Updated 2 months ago
- ☆231Updated 4 months ago
- LLM Inference on consumer devices☆125Updated 7 months ago
- rl from zero pretrain, can it be done? yes.☆279Updated last month
- ☆233Updated 7 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆231Updated last week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆325Updated last year
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆98Updated 3 months ago
- Train your own SOTA deductive reasoning model☆109Updated 7 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆204Updated 2 weeks ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆248Updated 9 months ago
- Pivotal Token Search☆130Updated 3 months ago
- Samples of good AI generated CUDA kernels☆91Updated 5 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆288Updated this week
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 6 months ago
- PyTorch implementation of models from the Zamba2 series.☆185Updated 9 months ago
- code for training & evaluating Contextual Document Embedding models☆199Updated 5 months ago
- Felafax is building AI infra for non-NVIDIA GPUs☆568Updated 9 months ago
- Simple high-throughput inference library☆149Updated 5 months ago
- Long context evaluation for large language models☆224Updated 7 months ago
- An implementation of bucketMul LLM inference☆223Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆107Updated 7 months ago
- SIMD quantization kernels☆89Updated last month
- Fast parallel LLM inference for MLX☆224Updated last year
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆168Updated 5 months ago