Nano vLLM with vLLM v1's request scheduling strategy and chunked prefill
☆90Jan 26, 2026Updated 4 months ago
Alternatives and similar repositories for nano-vllm-v1
Users that are interested in nano-vllm-v1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multilingual Pre-training with Language and Task Adaptation for Multilingual Text Style Transfer (ACL 2022)☆10Sep 22, 2022Updated 3 years ago
- Implementation for ACL 2024 paper "Meta-Task Prompting Elicits Embeddings from Large Language Models"☆12Jul 25, 2024Updated last year
- A NS-3 implementation of Poseidon congestion control algorithm (NSDI 2023).☆34Jan 28, 2024Updated 2 years ago
- ☆30Apr 17, 2025Updated last year
- ☆24Jan 16, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An MLIR-based compiler from C/C++ to AMD-Xilinx Versal AIE☆17Aug 5, 2022Updated 3 years ago
- official code repo for paper "Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging"☆25Oct 11, 2025Updated 8 months ago
- A Simple RDMA Wheel☆22Mar 31, 2019Updated 7 years ago
- ☆14Aug 9, 2023Updated 2 years ago
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆20Jun 12, 2025Updated last year
- ☆10Mar 3, 2024Updated 2 years ago
- ☆19May 26, 2023Updated 3 years ago
- 给llvm17.0.6添加一个新后端Cpu0☆12Apr 22, 2024Updated 2 years ago
- EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration☆36Mar 10, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆21Jul 13, 2025Updated 11 months ago
- YCSB-C for HWDB!☆18May 30, 2020Updated 6 years ago
- ☆21Mar 25, 2023Updated 3 years ago
- PyTorch Implementation of Online Training of Spiking Recurrent Neural Networks with Phase-Change Memory Synapses☆21Sep 25, 2021Updated 4 years ago
- Generate versal system design from ONNX model. AI engine kernels. Sub-microsecond speeds for autoencoders.☆18Dec 29, 2024Updated last year
- An alternative Vivado custom design example (to fully Vitis) for the User Logic Partition targeting VCK5000☆14Jul 16, 2024Updated last year
- Code for data-aware compression of DeepSeek models☆74Dec 11, 2025Updated 6 months ago
- IREE C++ Template☆17Jul 30, 2024Updated last year
- CS294 AI Systems Class Website☆18Apr 25, 2022Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆29Apr 7, 2025Updated last year
- Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.☆46Aug 13, 2021Updated 4 years ago
- ☆69Jun 2, 2023Updated 3 years ago
- This is a cross-chip platform collection of operators and a unified neural network library.☆17Nov 3, 2023Updated 2 years ago
- ☆25May 27, 2026Updated 2 weeks ago
- 分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等☆2,497May 30, 2026Updated 2 weeks ago
- MIT 6.172 Performance Engineering of Software Systems☆16Dec 30, 2021Updated 4 years ago
- [ICLR 2026] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆60May 21, 2026Updated 3 weeks ago
- This is the official implementation of ECCV2024 paper "Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Percepti…☆19Aug 13, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Start AI Compiler☆51Feb 26, 2026Updated 3 months ago
- [ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo☆73Mar 11, 2026Updated 3 months ago
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆22Jul 27, 2023Updated 2 years ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆65Mar 25, 2025Updated last year
- 编译原理 2018秋 6次PA☆30Jan 9, 2019Updated 7 years ago
- ☆33Mar 31, 2026Updated 2 months ago
- Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"☆25Feb 24, 2023Updated 3 years ago