modelize-ai / LLM-Inference-Deployment-TutorialLinks
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Updated last year
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below
Sorting:
- FuseAI Project☆87Updated 7 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆41Updated last year
- Open Implementations of LLM Analyses☆106Updated 10 months ago
- ☆41Updated 4 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆43Updated last year
- ☆59Updated 8 months ago
- Benchmark baseline for retrieval qa applications☆116Updated last year
- A collection of reproducible inference engine benchmarks☆32Updated 4 months ago
- A list of LLM benchmark frameworks.☆70Updated last year
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆175Updated last year
- Repo of ACL 2025 Paper "Quantification of Large Language Model Distillation"☆91Updated last month
- LLMem: GPU Memory Estimation for Fine-Tuning Pre-Trained LLMs☆22Updated 3 months ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆97Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- ☆94Updated 8 months ago
- ☆96Updated 11 months ago
- ☆17Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated 10 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆97Updated last year
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆76Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- ☆30Updated last year
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆85Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆100Updated 3 weeks ago
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated last year
- Data preparation code for Amber 7B LLM☆91Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Updated last year
- Leveraging large language models for text-to-SQL synthesis, this project fine-tunes WizardLM/WizardCoder-15B-V1.0 with QLoRA on a custom …☆44Updated last year
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆136Updated last year