Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Sep 5, 2023Updated 2 years ago
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Sep 4, 2023Updated 2 years ago
- Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and L…☆19Apr 12, 2024Updated 2 years ago
- ☆12Nov 22, 2022Updated 3 years ago
- Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.☆29Apr 25, 2023Updated 3 years ago
- 🚀 Automatically convert unstructured data into a high-quality 'textbook' format, optimized for fine-tuning Large Language Models (LLMs)☆26Oct 15, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Website for Stanford SysML Seminar☆17Oct 27, 2025Updated 7 months ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆16Nov 1, 2021Updated 4 years ago
- On-device real-time RAG App built using Jina Reader, Mediapipe, Gemma 2b IT LLM.☆15Apr 15, 2024Updated 2 years ago
- Pytorch Implementation of paper Attention-based Ensemble forDeep Metric Learning☆14Jul 20, 2020Updated 5 years ago
- TaskWeaver Plugins☆12Jan 28, 2024Updated 2 years ago
- ☆19Sep 15, 2022Updated 3 years ago
- [BMVC 2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"☆20Dec 21, 2021Updated 4 years ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆20Apr 5, 2026Updated 2 months ago
- ATC23 AE☆45May 11, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Implementations of transformer models in pytorch☆14Jun 2, 2020Updated 6 years ago
- Code showing how to use a model based on the ML model base class.☆10Sep 30, 2022Updated 3 years ago
- ☆26Oct 2, 2023Updated 2 years ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated 2 years ago
- ☆14Apr 20, 2023Updated 3 years ago
- Scripts for reading, extracting, and organizing data from either HTML or PDF documents and prepare them to be converted into embeddings f…☆13Aug 26, 2024Updated last year
- auto-rust is an experimental project that automatically generate Rust code with LLM (Large Language Models) during compilation, utilizing…☆50Nov 12, 2024Updated last year
- An implementation of the Visual Transformer Architecture introduced in the paper "Visual Transformers: Token-based Image Representation a…☆17May 27, 2021Updated 5 years ago
- ☆18Apr 24, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆16Oct 22, 2023Updated 2 years ago
- [CVPR 2022 Challenge Rank 1st] The official code for V2L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval…☆29Jul 30, 2022Updated 3 years ago
- Welcome to our SaaS AI platform repository! 🚀 Our SaaS AI platform is designed to empower businesses with cutting-edge Artificial Intel…☆18Jul 22, 2023Updated 2 years ago
- HyDE based RAG using NVIDIA NIM.☆16Mar 20, 2024Updated 2 years ago
- Kubeflow on OpenShift☆14Jan 24, 2019Updated 7 years ago
- GPT: Rust Assistant. Your go-to expert in the Rust ecosystem, specializing in precise code interpretation, up-to-date crate version check…☆19Mar 4, 2025Updated last year
- ☆11Apr 3, 2023Updated 3 years ago
- Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.☆16Updated this week
- Prompting For Named Entity Recognition☆19Sep 6, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- SLIM Models by LLMWare. A streamlit app showing the capabilities for AI Agents and Function Calls.☆21Feb 11, 2024Updated 2 years ago
- ☆22Oct 14, 2024Updated last year
- ☆15Jul 18, 2023Updated 2 years ago
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- ☆11Sep 22, 2017Updated 8 years ago
- Control LLM generation format efficiently. A simple version of microsoft/aici in vllm and transformers☆14Jun 7, 2024Updated 2 years ago
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago