zhihu / ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆835Updated this week
Alternatives and similar repositories for ZhiLight:
Users that are interested in ZhiLight are comparing it to the libraries listed below
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆213Updated 3 months ago
- Unified KV Cache Compression Methods for Auto-Regressive Models☆854Updated 3 weeks ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆801Updated 2 weeks ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆607Updated last week
- ☆311Updated last week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆220Updated this week
- FlagPerf is an open-source software platform for benchmarking AI chips.☆319Updated 3 weeks ago
- An Innovative Agent Framework Driven by KG Engine☆650Updated 2 weeks ago
- ☆61Updated 2 months ago
- Align Anything: Training All-modality Model with Feedback☆938Updated this week
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆76Updated 3 months ago
- ☆68Updated 2 months ago
- 【高性能OpenAI LLM服务】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。☆91Updated this week
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆1,192Updated this week
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆153Updated 3 weeks ago
- ☆37Updated this week
- 【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式…☆153Updated 2 weeks ago
- A unified end-to-end machine intelligence platform☆527Updated 4 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆209Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆424Updated 2 months ago
- llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…☆74Updated 8 months ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆236Updated 10 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆281Updated this week
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆136Updated 6 months ago
- Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…☆395Updated last month
- LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.☆421Updated this week
- Easiest and laziest way for building multi-agent LLMs applications.☆1,072Updated last week
- 更友好的nanoGPT的中文教程☆112Updated 8 months ago
- [NeurIPS 2024] BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models☆230Updated 2 months ago
- Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"☆915Updated last year