changjonathanc / flex-nano-vllmLinks
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆303Updated 2 weeks ago
Alternatives and similar repositories for flex-nano-vllm
Users that are interested in flex-nano-vllm are comparing it to the libraries listed below
Sorting:
- ☆225Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆212Updated 8 months ago
- Load compute kernels from the Hub☆327Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆196Updated 5 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆300Updated 3 weeks ago
- ☆178Updated last year
- ☆907Updated 2 weeks ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…