modelize-ai / LLM-Inference-Deployment-Tutorial
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Updated last year
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial:
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆41Updated last year
- FuseAI Project☆84Updated 2 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated 4 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Updated 7 months ago
- Self-host LLMs with LMDeploy and BentoML☆18Updated 2 weeks ago
- Manages vllm-nccl dependency☆17Updated 9 months ago
- LMTuner: Make the LLM Better for Everyone☆34Updated last year
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆73Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆72Updated last year
- ☆36Updated 6 months ago
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆50Updated last month
- Lighter, cheaper and faster RAG toolkit (Graph RAG) supported by TargetPilot☆44Updated 5 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- ☆32Updated 9 months ago
- Light local website for displaying performances from different chat models.☆85Updated last year
- ☆25Updated last month
- A Python implementation of Toolformer using Huggingface Transformers☆15Updated 2 years ago
- LLMs as Collaboratively Edited Knowledge Bases☆45Updated last year
- Self-Controlled Memory System for LLMs☆46Updated 11 months ago
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆38Updated 2 months ago
- Source code for GreaTer - Gradient Over Reasoning makes Smaller Language Models Strong Prompt Optimizers☆17Updated last month
- ☆18Updated 3 weeks ago
- kimi-chat 测试数据☆7Updated last year
- Linear Attention Sequence Parallelism (LASP)☆79Updated 9 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated 11 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Longitudinal Evaluation of LLMs via Data Compression☆32Updated 10 months ago