modelize-ai / LLM-Inference-Deployment-TutorialLinks

Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.

☆19

Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial

Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below

Sorting:

FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆39Updated last year
Zheng0428 / COIG-Kun
☆36Updated 9 months ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆57Updated last year
OpenLLMAI / OpenLLMDE
OpenLLMDE: An open source data engineering framework for LLMs
☆17Updated last year
vllm-project / vllm-nccl
Manages vllm-nccl dependency
☆17Updated last year
18907305772 / FuseAI
FuseAI Project
☆87Updated 5 months ago
OPPO-PersonalAI / OAgents
☆33Updated this week
OpenLMLab / scaling-rope
code for Scaling Laws of RoPE-based Extrapolation
☆73Updated last year
hellangleZ / Qwen3_autothink_adapter
Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…
☆20Updated last month
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated last year
tanyuqian / cappy
NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
☆43Updated last year
BaichuanSEED / BaichuanSEED.github.io
Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…
☆18Updated 10 months ago
wangguojim / LargeScale
☆19Updated last year
zenrran4nlp / Awesome-LLM-Inference-Serving
☆36Updated 2 months ago
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆58Updated last year
WalkerMitty / Fast-Llama2
Fast instruction tuning with Llama2
☆11Updated last year
zexuanqiu / CLongEval
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
☆40Updated last year
bentoml / BentoLMDeploy
Self-host LLMs with LMDeploy and BentoML
☆20Updated 2 weeks ago
InternLM / Condor
[ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
☆30Updated last month
WENGSYX / LMTuner
LMTuner: Make the LLM Better for Everyone
☆35Updated last year
yifeiwang77 / Self-Correction
☆20Updated 7 months ago
dmarx / zero-shot-intent-classifier
Minimal zero-shot intent classifier for arbitrary intent slot filling, via LLM prompting w LangChain.
☆33Updated 2 years ago
GeeeekExplorer / cupytorch
A small framework mimics PyTorch using CuPy or NumPy
☆37Updated 3 years ago
zzlgreat / smart_agent
☆105Updated last year
M1n9X / GraphRAG_Lite
☆16Updated 11 months ago
zhaochenyang20 / Prompt2Model-Self-Guide
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper
☆32Updated last year
Bui1dMySea / MemLong
☆94Updated 6 months ago
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
mrcabbage972 / simple-toolformer
A Python implementation of Toolformer using Huggingface Transformers
☆14Updated 2 years ago
CLUEbenchmark / SuperCLUE-Industry
中文原生工业测评基准
☆13Updated last year