A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.
☆94Jun 22, 2026Updated last week
Alternatives and similar repositories for xllm-service
Users that are interested in xllm-service are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- a mllm inference engine for academic research☆21Jan 30, 2026Updated 4 months ago
- 中科大郑启龙2021年并行程序设计课程实验☆11Jan 15, 2022Updated 4 years ago
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆68Feb 12, 2026Updated 4 months ago
- 胖宝宝☆40Mar 15, 2025Updated last year
- 📚 经典技术书籍 PDF 文件,持续更新...☆13Jan 21, 2019Updated 7 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A TimerQueue Based on Poll☆14May 13, 2019Updated 7 years ago
- ☆97Mar 26, 2025Updated last year
- An object-oriented interface for abstracting away the ugly parts of ad server APIs☆14Apr 8, 2016Updated 10 years ago
- A high-performance inference system for large language models, designed for production environments.☆500Dec 19, 2025Updated 6 months ago
- Cascade Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow, based on matterport mrcnn☆29Mar 24, 2020Updated 6 years ago
- Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"☆11Nov 18, 2022Updated 3 years ago
- 本科毕设 - 一个基于FinGLM的多模态大模型的金融问答系统☆31Jun 26, 2024Updated 2 years ago
- cpp rotation album,基于cpp eigen实现的3d旋转相册,GAMES101复现内容☆12Jul 25, 2022Updated 3 years ago
- OrqueIO main source code repository☆39Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This example builds on the parallel-forall repo separate compilation example by adding CMake to it.☆17Nov 14, 2017Updated 8 years ago
- Code and data release of the paper Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows☆15Oct 4, 2024Updated last year
- ☆12Dec 21, 2022Updated 3 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆18Dec 19, 2024Updated last year
- Official implementation of SimFlow☆32Dec 16, 2025Updated 6 months ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 3 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆12Sep 1, 2023Updated 2 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Jun 14, 2023Updated 3 years ago
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- ☆12Mar 13, 2023Updated 3 years ago
- 用C++和Python实现从头实现一个深度学习训练框架☆12Nov 22, 2020Updated 5 years ago
- Scripts to prepare OXFORD VGG Face dataset☆12Mar 29, 2016Updated 10 years ago
- This is part of the zeus library, just for sharing and funny.☆35Apr 5, 2023Updated 3 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 10 months ago
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆82Aug 12, 2024Updated last year
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆28Jun 20, 2026Updated last week
- Short RL☆18Apr 16, 2026Updated 2 months ago
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 3 years ago
- Collection of LLM completions for reasoning-gym task datasets☆31Jul 4, 2025Updated 11 months ago
- This repo holds the research projects of our lab.☆11Jan 20, 2024Updated 2 years ago
- [CVPR2026] ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving & [CVPR2025…☆56Mar 26, 2026Updated 3 months ago