☆49Apr 29, 2025Updated last year
Alternatives and similar repositories for Awesome-LLM-Inference-Serving
Users that are interested in Awesome-LLM-Inference-Serving are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Apr 22, 2024Updated 2 years ago
- ☆27May 12, 2026Updated 3 weeks ago
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆90Oct 15, 2025Updated 7 months ago
- Brain tumor images classification with ResNet, EfficientNet, EfficientNet_V2 and Compact Convolutional Transformers architectures with Py…☆11Jan 5, 2023Updated 3 years ago
- 上海大学本硕博一体化选课系统自动选课工具☆18Oct 30, 2022Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆32Jun 14, 2024Updated last year
- ☆24Mar 7, 2025Updated last year
- Retrieve papers from DBLP based on CCF ranting☆10Dec 9, 2025Updated 6 months ago
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆51Mar 25, 2025Updated last year
- [AAAI 2026] Official Repository for "Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models"☆42May 29, 2026Updated last week
- MPI Code Generation through Domain-Specific Language Models☆16Nov 19, 2024Updated last year
- This repo is for our EMNLP2023 short paper (Findings): InstOptima: Evolutionary Multi-objective Instruction Optimization via Large Langua…☆14Jan 11, 2024Updated 2 years ago
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆72Jul 8, 2025Updated 11 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆16Sep 22, 2024Updated last year
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆29Jul 15, 2025Updated 10 months ago
- ☆16Nov 10, 2023Updated 2 years ago
- ☆15Mar 20, 2025Updated last year
- ☆51Mar 20, 2026Updated 2 months ago
- Machine Learning Based DDoS Detection (HTTP,UDP,TCP and ICMP Flood Attack)☆10Jul 3, 2018Updated 7 years ago
- [SIGCOMM 2023] PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale☆15Jul 1, 2023Updated 2 years ago
- Chinese Translation for Bartosz Milewski's 'Category Theory for Programmers'. 《写给程序员的范畴论》中文翻译 欢迎 PR☆11Oct 4, 2024Updated last year
- LLM Serving Performance Evaluation Harness☆84Feb 25, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆29Oct 19, 2025Updated 7 months ago
- Flow level simulation☆15Nov 22, 2015Updated 10 years ago
- ☆86May 24, 2026Updated 2 weeks ago
- BadgerTrap is a tool to instrument x86-64 TLB misses.☆13Nov 13, 2016Updated 9 years ago
- [Archived - See https://github.com/rustsbi/rustsbi/] RustSBI prototyper☆12Feb 16, 2025Updated last year
- ☆23May 29, 2023Updated 3 years ago
- A Comprehensive Survey on Long Context Language Modeling☆247May 29, 2026Updated last week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆134Feb 22, 2024Updated 2 years ago
- ☆21Apr 3, 2026Updated 2 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- This is an official GitHub repository for the paper, "Towards timeout-less transport in commodity datacenter networks.".☆15Sep 7, 2022Updated 3 years ago
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆24Sep 21, 2025Updated 8 months ago
- ☆115Feb 26, 2026Updated 3 months ago
- ☆14Jun 8, 2018Updated 8 years ago
- Plato is a system for viewport adaptation based bitrate adaptive VR video streaming.☆15May 1, 2018Updated 8 years ago
- This repository contains a list of papers on spatio-temporal graph, especially about GNNs on S-T graph.☆18Sep 8, 2023Updated 2 years ago
- ☆90Jan 23, 2025Updated last year