ScalingIntelligence / tokasaurusLinks
☆417Updated 3 weeks ago
Alternatives and similar repositories for tokasaurus
Users that are interested in tokasaurus are comparing it to the libraries listed below
Sorting:
- Storing long contexts in tiny caches with self-study☆181Updated last week
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆254Updated 3 weeks ago
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Updated 4 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆293Updated 3 weeks ago
- Pivotal Token Search☆124Updated 2 months ago
- ☆217Updated 7 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆322Updated 10 months ago
- ☆227Updated 6 months ago
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆242Updated 3 months ago
- ☆223Updated 2 months ago
- rl from zero pretrain, can it be done? yes.☆268Updated last month
- LLM Inference on consumer devices☆124Updated 6 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆245Updated 7 months ago
- PyTorch Single Controller☆419Updated this week
- PyTorch implementation of models from the Zamba2 series.☆185Updated 7 months ago
- Train your own SOTA deductive reasoning model☆106Updated 6 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆88Updated 4 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆210Updated last week
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆706Updated this week
- code for training & evaluating Contextual Document Embedding models☆197Updated 4 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆124Updated last month
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆164Updated 4 months ago
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆97Updated 2 months ago
- Felafax is building AI infra for non-NVIDIA GPUs☆566Updated 7 months ago
- An implementation of bucketMul LLM inference☆223Updated last year
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆120Updated last week
- Long context evaluation for large language models☆221Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- explore token trajectory trees on instruct and base models☆132Updated 3 months ago
- SIMD quantization kernels☆87Updated last week