ScalingIntelligence / tokasaurusLinks
☆369Updated this week
Alternatives and similar repositories for tokasaurus
Users that are interested in tokasaurus are comparing it to the libraries listed below
Sorting:
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆126Updated 2 months ago
- Pivotal Token Search☆109Updated last week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆316Updated 8 months ago
- ☆215Updated 4 months ago
- Super-fast Structured Outputs☆334Updated last week
- ☆214Updated 5 months ago
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆225Updated last month
- ☆188Updated 3 weeks ago
- LLM Inference on consumer devices☆121Updated 4 months ago
- An implementation of bucketMul LLM inference☆220Updated last year
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆620Updated 3 months ago
- Visualize the intermediate output of Mistral 7B☆366Updated 5 months ago
- PyTorch implementation of models from the Zamba2 series.☆184Updated 5 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆145Updated 2 months ago
- Lightweight Nearest Neighbors with Flexible Backends☆294Updated this week
- Train your own SOTA deductive reasoning model☆99Updated 4 months ago
- code for training & evaluating Contextual Document Embedding models☆194Updated 2 months ago
- prime-rl is a codebase for decentralized async RL training at scale☆368Updated this week
- Long context evaluation for large language models☆220Updated 4 months ago
- ☆196Updated 2 months ago
- a curated list of data for reasoning ai☆136Updated 11 months ago
- ☆215Updated 2 weeks ago
- Fast parallel LLM inference for MLX☆198Updated last year
- ☆186Updated this week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆165Updated this week
- A simple tool that let's you explore different possible paths that an LLM might sample.☆175Updated 2 months ago
- PyTorch Single Controller☆325Updated this week
- SIMD quantization kernels☆73Updated this week
- Simple high-throughput inference library☆120Updated 2 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆113Updated this week