modelize-ai / LLM-Inference-Deployment-TutorialLinks
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Updated 2 years ago
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below
Sorting:
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆44Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆41Updated last year
- ☆46Updated 7 months ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆97Updated last year
- FuseAI Project☆87Updated 10 months ago
- Open sourced backend for Martian's LLM Inference Provider Leaderboard☆19Updated last year
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆76Updated last year
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated 2 years ago
- ☆42Updated last year
- Inference script for Meta's LLaMA models using Hugging Face wrapper☆110Updated 2 years ago
- ☆122Updated last year
- Benchmark baseline for retrieval qa applications☆118Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆99Updated 2 years ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆58Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Updated last year
- Open Implementations of LLM Analyses☆107Updated last year
- ☆17Updated last year
- Lighter, cheaper and faster RAG toolkit (Graph RAG) supported by TargetPilot☆46Updated 5 months ago
- Data preparation code for Amber 7B LLM☆93Updated last year
- Code for KaLM-Embedding models☆99Updated 4 months ago
- 中文金融大模型测评基准,六大类二十五任务、等级化评价,国内模型获得A级☆10Updated last year
- ☆95Updated 11 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆57Updated this week
- ☆100Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Updated last year
- LMTuner: Make the LLM Better for Everyone☆37Updated 2 years ago
- Langport is a language model inference service☆95Updated last year
- Self-host LLMs with LMDeploy and BentoML☆21Updated 4 months ago