A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.
☆92May 14, 2026Updated this week
Alternatives and similar repositories for xllm-service
Users that are interested in xllm-service are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.☆1,272Updated this week
- a mllm inference engine for academic research☆21Jan 30, 2026Updated 3 months ago
- 中科大郑启龙2021年并行程序设计课程实验☆11Jan 15, 2022Updated 4 years ago
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆68Feb 12, 2026Updated 3 months ago
- Large Language Model (LLM) Serving Paper and Resource List☆28Apr 16, 2026Updated last month
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- An implementation of SGEMV with performance comparable to cuBLAS.☆12May 21, 2021Updated 4 years ago
- 胖宝宝☆38Mar 15, 2025Updated last year
- A TimerQueue Based on Poll☆14May 13, 2019Updated 7 years ago
- ☆98Mar 26, 2025Updated last year
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- ☆16Apr 11, 2026Updated last month
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 6 months ago
- 本科毕设 - 一个基于FinGLM的多模态大模型的金融问答系统☆31Jun 26, 2024Updated last year
- The official implementation of InfoRM [NeurIPS 2024].☆15Oct 25, 2025Updated 6 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…☆10Feb 7, 2026Updated 3 months ago
- mobileNet SSD 基于caffe的前向检测☆10Nov 30, 2018Updated 7 years ago
- Code and data release of the paper Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows☆15Oct 4, 2024Updated last year
- HELP: a dataset for Handling Entailments with Lexical and logical Phenomena (Ver.1.0)☆15Jul 20, 2023Updated 2 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆17Dec 19, 2024Updated last year
- ☆18Apr 10, 2025Updated last year
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- ☆12Mar 13, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 用C++和Python实现从头实现一个深度学习训练框架☆12Nov 22, 2020Updated 5 years ago
- Scripts to prepare OXFORD VGG Face dataset☆12Mar 29, 2016Updated 10 years ago
- This is part of the zeus library, just for sharing and funny.☆35Apr 5, 2023Updated 3 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 3 years ago
- ☆11Apr 5, 2021Updated 5 years ago
- Short RL☆18Apr 16, 2026Updated last month
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆24Updated this week
- Collection of LLM completions for reasoning-gym task datasets☆31Jul 4, 2025Updated 10 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- This repo holds the research projects of our lab.☆11Jan 20, 2024Updated 2 years ago
- dataloader for mocap dataset☆35Oct 21, 2025Updated 6 months ago
- Stable Diffusion in TensorRT 8.5+☆15Mar 19, 2023Updated 3 years ago
- ☆25Dec 30, 2025Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- ☆18Nov 30, 2025Updated 5 months ago