~950 line, minimal, extensible LLM inference engine built from scratch.
☆474Jan 9, 2026Updated 5 months ago
Alternatives and similar repositories for simple-llm
Users that are interested in simple-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Continual Learning Bench☆136Updated this week
- Tiny evaluation of leading LLMs on competitive programming problems☆14Apr 10, 2026Updated last month
- Scripts for training Qwen 2.5 VL with ms-swift and GRPO☆12Feb 27, 2025Updated last year
- KV Cache & LoRA for minGPT☆63Mar 4, 2026Updated 3 months ago
- Official Implementation for the Paper [AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent](https://arxiv.org/abs/2602.…☆51May 9, 2026Updated last month
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Crosshair guidelines for ComfyUI to help align nodes and groups while moving or resizing.☆35Apr 28, 2026Updated last month
- Official repo for UAE☆200May 31, 2026Updated last week
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 9 months ago
- Github action to upload datasets to kaggle☆24May 20, 2026Updated 3 weeks ago
- A locally trained model of Stoney Nakoda has been developed and released. You can access the working model here or train your own instanc…☆10May 27, 2026Updated 2 weeks ago
- Storing long contexts in tiny caches with self-study☆270Mar 23, 2026Updated 2 months ago
- ☆39Feb 18, 2025Updated last year
- ☆21Mar 3, 2025Updated last year
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆11Jan 9, 2019Updated 7 years ago
- Example repo showcasing model training and deployment with distil claude cli skill☆55Jan 19, 2026Updated 4 months ago
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆14Apr 3, 2025Updated last year
- MCP server for the X (Twitter) API -- give AI agents the ability to post, search, read, and engage on X☆46Mar 24, 2026Updated 2 months ago
- ☆10Oct 27, 2020Updated 5 years ago
- ☆254Jan 2, 2025Updated last year
- Adding Marimo to Datasette☆21Mar 24, 2025Updated last year
- NanoGPT (124M) in 90 seconds☆5,337Jun 2, 2026Updated last week
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆379Nov 15, 2025Updated 6 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆13Nov 30, 2022Updated 3 years ago
- General-purpose planning and execution harness for LLMs — structured phases, critique, gating, and review☆83Updated this week
- Repository for the implementation and evaluation of DD-GloVe, a train-time debiasing algorithm to learn GloVe word embeddings by leveragi…☆13May 29, 2022Updated 4 years ago
- A fully functional Convolutional VAE implemented in pure C from scratch.☆23Jan 19, 2026Updated 4 months ago
- [CVPR 2026] Official repo for "EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation"☆60Mar 13, 2026Updated 2 months ago
- AI-Driven Research Systems (ADRS)☆143Dec 17, 2025Updated 5 months ago
- Tools to make language models a bit easier to use☆65May 26, 2026Updated 2 weeks ago
- ☆76May 5, 2026Updated last month
- Code for the WWW'23 paper "Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy"☆12Feb 20, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆20May 30, 2024Updated 2 years ago
- ☆49Mar 23, 2026Updated 2 months ago
- RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like …☆13Mar 24, 2024Updated 2 years ago
- Fast and low-memory attention layer written in CUDA☆20Jul 14, 2023Updated 2 years ago
- PyTorch implementation of quantization-aware matrix factorization (QMF) for data compression☆16Jul 14, 2025Updated 10 months ago
- Google AIStudio Playgroud 反代,支持 Google 会员(Pro/Ultra),支持 Gemini 原生协议格式,包含生图、工具调用、Google搜索。☆145May 27, 2026Updated last week
- Reinforcement Learning example in Nim, playing tic tac toe. Based off original C version from the great Antirez☆15Apr 2, 2025Updated last year