FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆348Nov 2, 2025Updated 7 months ago
Alternatives and similar repositories for flex-nano-vllm
Users that are interested in flex-nano-vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tools for visualizing neural nets☆19Jul 29, 2025Updated 10 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,205Aug 26, 2025Updated 9 months ago
- Simple MPI implementation for prototyping or learning☆316Aug 6, 2025Updated 10 months ago
- A PyTorch native platform for training generative AI models☆5,416Updated this week
- ☆21Mar 3, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,886Oct 27, 2025Updated 7 months ago
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆148Nov 11, 2024Updated last year
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,574Jan 12, 2025Updated last year
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆415Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,857Apr 18, 2025Updated last year
- NanoGPT (124M) in 90 seconds☆5,337Updated this week
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆194Jan 19, 2026Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Our library for RL environments + evals☆4,167Updated this week
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆618Oct 7, 2025Updated 8 months ago
- ☆94Jun 30, 2025Updated 11 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆102Jul 24, 2025Updated 10 months ago
- UNet diffusion model in pure CUDA☆658Jun 28, 2024Updated last year
- Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…☆31Oct 10, 2025Updated 7 months ago
- ☆52May 19, 2025Updated last year
- Learn CUDA with PyTorch☆321Jun 1, 2026Updated last week
- [Developmental] Quarto Extension to Enable Google Colaboratory Links with Quarto Documents☆17May 18, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆479May 17, 2025Updated last year
- ☆19Oct 3, 2022Updated 3 years ago
- CIFAR-10 speedrun: Trains to 94% accuracy in 1.98 seconds on a single NVIDIA A100 GPU.☆78Oct 17, 2025Updated 7 months ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆962Updated this week
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation☆35Feb 26, 2026Updated 3 months ago
- A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.☆4,331May 17, 2026Updated 3 weeks ago
- Official repository Flash Local Linear Attention☆36May 28, 2026Updated last week
- Tile primitives for speedy kernels☆3,405May 27, 2026Updated last week
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆101Apr 20, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official Code Repository for the paper "Continuous Diffusion Model for Language Modeling" (NeurIPS 2025).☆72Sep 25, 2025Updated 8 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Official implementation of TBA for async LLM post-training.☆31Nov 5, 2025Updated 7 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆60Oct 18, 2025Updated 7 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆600May 13, 2026Updated 3 weeks ago
- Stable and Efficient Reinforcement Learning for Trillion-Parameter LLMs☆133May 30, 2026Updated last week
- Cookbook of SGLang - Recipe☆135May 5, 2026Updated last month