Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Sep 5, 2023Updated 2 years ago
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below
Sorting:
- Deploy, launch and use LLMs on AWS☆16Jun 2, 2023Updated 2 years ago
- Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and L…☆18Apr 12, 2024Updated last year
- SadTalker gradio_demo.py file with code section that allows you to set the eye blink and pose reference videos for the software to use wh…☆11Jun 20, 2023Updated 2 years ago
- Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.☆28Apr 25, 2023Updated 2 years ago
- Website for Stanford SysML Seminar☆17Oct 27, 2025Updated 4 months ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆16Nov 1, 2021Updated 4 years ago
- Cleaned test data list of DukeMTMC-reID, ICCV2021☆15Aug 26, 2021Updated 4 years ago
- On-device real-time RAG App built using Jina Reader, Mediapipe, Gemma 2b IT LLM.☆15Apr 15, 2024Updated last year
- Machine Learning System☆14May 11, 2020Updated 5 years ago
- A UI designer for constructing AI applications with OpenSearch☆16Mar 13, 2026Updated last week
- ☆17Jun 14, 2023Updated 2 years ago
- TaskWeaver Plugins☆12Jan 28, 2024Updated 2 years ago
- ☆19Sep 15, 2022Updated 3 years ago
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆19Sep 18, 2025Updated 6 months ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆19Oct 26, 2013Updated 12 years ago
- ATC23 AE☆46May 11, 2023Updated 2 years ago
- Code showing how to use a model based on the ML model base class.☆10Sep 30, 2022Updated 3 years ago
- auto-rust is an experimental project that automatically generate Rust code with LLM (Large Language Models) during compilation, utilizing…☆46Nov 12, 2024Updated last year
- ☆26Oct 2, 2023Updated 2 years ago
- Streaming source separation for music and speech files, using the Open-Unmix LSTM architecture.☆21Dec 8, 2022Updated 3 years ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated last year
- Scripts for reading, extracting, and organizing data from either HTML or PDF documents and prepare them to be converted into embeddings f…☆13Aug 26, 2024Updated last year
- ☆17Apr 3, 2024Updated last year
- ☆15Oct 22, 2023Updated 2 years ago
- Welcome to our SaaS AI platform repository! 🚀 Our SaaS AI platform is designed to empower businesses with cutting-edge Artificial Intel…☆18Jul 22, 2023Updated 2 years ago
- UI for Pandas AI, the Python library that makes dataframes conversational.☆14Jun 6, 2023Updated 2 years ago
- ☆16Sep 10, 2024Updated last year
- Experimentation on google's gemma model☆16Mar 6, 2024Updated 2 years ago
- ☆11Apr 3, 2023Updated 2 years ago
- This repo consists of code for plotting top loss images☆13May 18, 2020Updated 5 years ago
- A simple example to showcase machine learning model deployment with an API☆10Mar 7, 2022Updated 4 years ago
- ☆15Jul 18, 2023Updated 2 years ago
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆12Mar 27, 2024Updated last year
- Materials for Machine Learning with H2O Open Platform at ODSC Masterclass Summit 2017☆12Mar 2, 2017Updated 9 years ago
- ☆11Sep 22, 2017Updated 8 years ago
- Control LLM generation format efficiently. A simple version of microsoft/aici in vllm and transformers☆14Jun 7, 2024Updated last year
- ☆40Mar 25, 2023Updated 2 years ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- ☆12Jul 9, 2021Updated 4 years ago