ksm26 / Efficiently-Serving-LLMsLinks
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
☆16Updated last year
Alternatives and similar repositories for Efficiently-Serving-LLMs
Users that are interested in Efficiently-Serving-LLMs are comparing it to the libraries listed below
Sorting:
- Fine-tune an LLM to perform batch inference and online serving.☆112Updated 3 weeks ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆111Updated 8 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆33Updated last month
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆77Updated 8 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 11 months ago
- ☆20Updated last year
- A collection of hand on notebook for LLMs practitioner☆48Updated 5 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 9 months ago
- experiments with inference on llama☆104Updated last year
- Includes examples on how to evaluate LLMs☆23Updated 7 months ago
- Sample notebooks and prompts for LLM evaluation☆135Updated 2 weeks ago
- ☆159Updated this week
- Obsolete version of CUDA-mode repo -- use cuda-mode/lectures instead☆25Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆35Updated last year
- Notes on Direct Preference Optimization☆19Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆105Updated 2 months ago
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆46Updated last year
- ☆77Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Updated 2 months ago
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆11Updated last year
- LLM_library is a comprehensive repository serves as a one-stop resource hands-on code, insightful summaries.☆69Updated last year
- Fine-tuning large language models (LLMs) is crucial for enhancing performance across domain-specific task applications. This comprehensiv…☆12Updated 9 months ago
- A RAG that can scale 🧑🏻💻☆11Updated last year
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated last year
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultin…☆23Updated last year
- Distributed training (multi-node) of a Transformer model☆71Updated last year
- 🤗 Collection of examples on how to train, deploy and monitor HuggingFace models in Google Cloud Vertex AI☆21Updated last year
- ☆23Updated last year
- ML/DL Math and Method notes☆61Updated last year
- How to quickly serve an LLM using Fast API, Celery, and Redis☆15Updated last year