A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.
☆89Mar 24, 2026Updated this week
Alternatives and similar repositories for xllm-service
Users that are interested in xllm-service are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A high-performance inference engine for LLMs, optimized for diverse AI accelerators.☆1,157Updated this week
- a mllm inference engine for academic research☆20Jan 30, 2026Updated last month
- 中科大郑启龙2021年并行程序设计课程实验☆11Jan 15, 2022Updated 4 years ago
- An implementation of SGEMV with performance comparable to cuBLAS.☆12May 21, 2021Updated 4 years ago
- 胖宝宝☆38Mar 15, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆97Mar 26, 2025Updated last year
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- A high-performance inference system for large language models, designed for production environments.☆496Dec 19, 2025Updated 3 months ago
- Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"☆11Nov 18, 2022Updated 3 years ago
- ☆15Dec 2, 2025Updated 3 months ago
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 4 months ago
- The official implementation of InfoRM [NeurIPS 2024].☆15Oct 25, 2025Updated 5 months ago
- cpp rotation album,基于cpp eigen实现的3d旋转相册,GAMES101复现内容☆12Jul 25, 2022Updated 3 years ago
- OrqueIO main source code repository☆22Mar 18, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- mobileNet SSD 基于caffe的前向检测☆10Nov 30, 2018Updated 7 years ago
- Code and data release of the paper Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows☆14Oct 4, 2024Updated last year
- HELP: a dataset for Handling Entailments with Lexical and logical Phenomena (Ver.1.0)☆15Jul 20, 2023Updated 2 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- ☆18Apr 10, 2025Updated 11 months ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- ☆10Jun 29, 2020Updated 5 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆12Sep 1, 2023Updated 2 years ago
- ☆12Mar 13, 2023Updated 3 years ago
- 用C++和Python实现从头实现一个深度学习训练框架☆12Nov 22, 2020Updated 5 years ago
- dataloader for mocap dataset☆29Oct 21, 2025Updated 5 months ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 7 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆81Aug 12, 2024Updated last year
- ☆11Apr 5, 2021Updated 4 years ago
- Short RL☆18May 26, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Mar 18, 2026Updated last week
- Collection of LLM completions for reasoning-gym task datasets☆31Jul 4, 2025Updated 8 months ago
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- This repo holds the research projects of our lab.☆11Jan 20, 2024Updated 2 years ago
- [ACL'24] Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correla…☆47May 11, 2025Updated 10 months ago
- Stable Diffusion in TensorRT 8.5+☆15Mar 19, 2023Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year