~950 line, minimal, extensible LLM inference engine built from scratch.
☆469Jan 9, 2026Updated 3 months ago
Alternatives and similar repositories for simple-llm
Users that are interested in simple-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A DSPy Adapter for exact-fidelity prompt templates with full control over messages.☆43Feb 23, 2026Updated last month
- Tiny evaluation of leading LLMs on competitive programming problems☆14Apr 10, 2026Updated last week
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆22Dec 2, 2025Updated 4 months ago
- ROSA-Tuning☆71Feb 4, 2026Updated 2 months ago
- ☆68Apr 7, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 7 months ago
- ☆11Jun 7, 2023Updated 2 years ago
- Code for Bolmo: Byteifying the Next Generation of Language Models☆130Mar 13, 2026Updated last month
- Fortnite Proxy Based Private Server☆15Dec 28, 2024Updated last year
- Storing long contexts in tiny caches with self-study☆261Mar 23, 2026Updated 3 weeks ago
- Example repo showcasing model training and deployment with distil claude cli skill☆56Jan 19, 2026Updated 3 months ago
- ☆21Mar 3, 2025Updated last year
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆14Apr 3, 2025Updated last year
- General-purpose planning and execution harness for LLMs — structured phases, critique, gating, and review☆53Apr 11, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- Online editor for Destack - a page builder based on Next.js 🅧, Tailwind CSS 🍃 & Grapes.js 🍇.☆14Feb 6, 2022Updated 4 years ago
- ☆249Jan 2, 2025Updated last year
- code for training & evaluating Contextual Document Embedding models☆203May 14, 2025Updated 11 months ago
- Adding Marimo to Datasette☆21Mar 24, 2025Updated last year
- NanoGPT (124M) in 2 minutes☆5,095Updated this week
- 在 ComfyUI 中使用火山方舟API提供的即梦(豆包)Seedream 和 Seedance 模型。 / Using the Volcano Ark API to utilize the Seedream and Seedance models of Jimeng (D…☆38Updated this week
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆369Nov 15, 2025Updated 5 months ago
- Structured Generation Evals☆14Sep 25, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A fully functional Convolutional VAE implemented in pure C from scratch.☆23Jan 19, 2026Updated 3 months ago
- A terminal-based AI assistant for Linux sysadmins.☆37Mar 20, 2026Updated 3 weeks ago
- AI-Driven Research Systems (ADRS)☆137Dec 17, 2025Updated 4 months ago
- Tools to make language models a bit easier to use☆65Updated this week
- Less-wrong single-file Numba-accelerated Python implementation of Gotoh affine gap penalty extensions for the Needleman–Wunsch, Smith-Wat…☆12Oct 30, 2025Updated 5 months ago
- Code associated with the paper: "Few-Shot Self-Rationalization with Natural Language Prompts"☆13Apr 27, 2022Updated 3 years ago
- An overlay to help speed runners of Resident Evil games☆13Jun 5, 2023Updated 2 years ago
- [TMI' 23] FedDM: Federated Weakly Supervised Segmentation via Annotation Calibration and Gradient De-conflicting☆14Mar 11, 2023Updated 3 years ago
- Code for the WWW'23 paper "Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy"☆12Feb 20, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆20May 30, 2024Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Jun 11, 2023Updated 2 years ago
- Scalable framework for comparing metric measure spaces with up to 1M points.☆16Apr 6, 2021Updated 5 years ago
- ☆11Feb 22, 2025Updated last year
- Fast and low-memory attention layer written in CUDA☆20Jul 14, 2023Updated 2 years ago
- Learn CUDA with PyTorch☆274Apr 9, 2026Updated last week
- CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Switching☆18Mar 29, 2021Updated 5 years ago