Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine.
☆19Sep 5, 2023Updated 2 years ago
Alternatives and similar repositories for LLM-Inference-Deployment-Tutorial
Users that are interested in LLM-Inference-Deployment-Tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Deploy, launch and use LLMs on AWS☆16Jun 2, 2023Updated 3 years ago
- ☆16Sep 4, 2023Updated 2 years ago
- ☆10Mar 28, 2023Updated 3 years ago
- Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and L…☆19Apr 12, 2024Updated 2 years ago
- ☆11Nov 22, 2022Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Website for Stanford SysML Seminar☆17Oct 27, 2025Updated 8 months ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆16Nov 1, 2021Updated 4 years ago
- Cleaned test data list of DukeMTMC-reID, ICCV2021☆15Aug 26, 2021Updated 4 years ago
- On-device real-time RAG App built using Jina Reader, Mediapipe, Gemma 2b IT LLM.☆15Apr 15, 2024Updated 2 years ago
- CLIP-based Fusion-modal Reconstructing Hashing for Unsupervised Large-scale Cross-modal Retrieval☆14Aug 7, 2023Updated 2 years ago
- TaskWeaver Plugins☆12Jan 28, 2024Updated 2 years ago
- ☆19Sep 15, 2022Updated 3 years ago
- [BMVC 2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"☆20Dec 21, 2021Updated 4 years ago
- Super Mario is a legendary game we all cherish! In this project, we will deploy Super Mario on Amazon EKS (Elastic Kubernetes Service) us…☆11Feb 3, 2026Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆21Sep 18, 2025Updated 9 months ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆20Apr 5, 2026Updated 2 months ago
- ATC23 AE☆45May 11, 2023Updated 3 years ago
- Code showing how to use a model based on the ML model base class.☆10Sep 30, 2022Updated 3 years ago
- (Unofficial) Data-Distortion Guided Self-Distillation for Deep Neural Networks (AAAI 2019)☆14May 12, 2021Updated 5 years ago
- High-order nonlocal Hashing for unsupervised cross-modal retrieval☆14Nov 11, 2023Updated 2 years ago
- ☆26Oct 2, 2023Updated 2 years ago
- ☆16May 19, 2024Updated 2 years ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆13Apr 20, 2023Updated 3 years ago
- Source code for paper "Similarity Search in High Dimensions via Hashing" on VLDH-1999☆17Jan 1, 2020Updated 6 years ago
- This repository will take you through creating a FastAPI StableDiffusion app (including Dockerfile) all the way to adding a new feature u…☆38Nov 9, 2022Updated 3 years ago
- A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal…☆12Sep 16, 2024Updated last year
- ☆18Apr 24, 2025Updated last year
- Pure Tensorflow implementation of the SGR layer as proposed in "Symbolic Graph Reasoning Meets Convolutions" .☆19Jun 17, 2021Updated 5 years ago
- ☆17Apr 3, 2024Updated 2 years ago
- ☆16Oct 22, 2023Updated 2 years ago
- Deep Semantic-Alignment Hashing(ICMR2020, Oral)☆18Oct 20, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR 2022 Challenge Rank 1st] The official code for V2L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval…☆29Jul 30, 2022Updated 3 years ago
- Welcome to our SaaS AI platform repository! 🚀 Our SaaS AI platform is designed to empower businesses with cutting-edge Artificial Intel…☆18Jul 22, 2023Updated 2 years ago
- HyDE based RAG using NVIDIA NIM.☆16Mar 20, 2024Updated 2 years ago
- Kubeflow on OpenShift☆14Jan 24, 2019Updated 7 years ago
- ☆16Sep 10, 2024Updated last year
- Experimentation on google's gemma model☆16Mar 6, 2024Updated 2 years ago
- ☆11Apr 3, 2023Updated 3 years ago