modelize-ai / LLM-Inference-Deployment-TutorialLinks
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Updated 2 years ago
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below
Sorting:
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆41Updated last year
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆97Updated last year
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆43Updated last year
- FuseAI Project☆87Updated 7 months ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Updated 10 months ago
- Inference script for Meta's LLaMA models using Hugging Face wrapper☆110Updated 2 years ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆177Updated 2 years ago
- ☆97Updated 11 months ago
- Repo of ACL 2025 Paper "Quantification of Large Language Model Distillation"☆92Updated last month
- A collection of reproducible inference engine benchmarks☆33Updated 5 months ago
- Open Implementations of LLM Analyses☆107Updated 11 months ago
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆56Updated 2 years ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆97Updated last year
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆77Updated last year
- Manages vllm-nccl dependency☆17Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆58Updated last year
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆85Updated 8 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Updated last year
- ☆32Updated last year
- An Experiment on Dynamic NTK Scaling RoPE☆64Updated last year
- ☆42Updated 4 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- ☆17Updated last year
- Leveraging large language models for text-to-SQL synthesis, this project fine-tunes WizardLM/WizardCoder-15B-V1.0 with QLoRA on a custom …☆44Updated last year
- ☆85Updated 2 years ago
- A list of LLM benchmark frameworks.☆70Updated last year
- Data preparation code for Amber 7B LLM☆93Updated last year
- Langchain implementation of HuggingGPT☆133Updated 2 years ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated 11 months ago