modelize-ai / LLM-Inference-Deployment-TutorialLinks
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Updated last year
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below
Sorting:
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆37Updated last year
- FuseAI Project☆87Updated 4 months ago
- ☆36Updated 9 months ago
- OpenLLMDE: An open source data engineering framework for LLMs☆17Updated last year
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆43Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆21Updated 3 months ago
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Updated 9 months ago
- LLMs as Collaboratively Edited Knowledge Bases☆45Updated last year
- Manages vllm-nccl dependency☆17Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated last year
- Nano repo for RL training of LLMs☆60Updated last week
- ☆33Updated last month
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆57Updated last year
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆32Updated last year
- LMTuner: Make the LLM Better for Everyone☆35Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated 6 months ago
- Fast instruction tuning with Llama2☆11Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated last year
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆95Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆75Updated 11 months ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated last year
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆54Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆72Updated last year
- ☆16Updated 10 months ago
- ☆17Updated last year
- Reformatted Alignment☆114Updated 8 months ago
- ☆27Updated this week
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆16Updated 7 months ago