ksm26 / Efficiently-Serving-LLMsLinks

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

☆17

Alternatives and similar repositories for Efficiently-Serving-LLMs

Users that are interested in Efficiently-Serving-LLMs are comparing it to the libraries listed below

Sorting:

anyscale / e2e-llm-workflows
Fine-tune an LLM to perform batch inference and online serving.
☆113Updated 6 months ago
daniel-furman / sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.
☆78Updated last year
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆111Updated last year
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆51Updated last year
ThinamXx / Meta-llama
Complete implementation of Llama2 with/without KV cache & inference 🚀
☆48Updated last year
IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆33Updated 2 months ago
infocusp / llm_seminar_series
Material for the series of seminars on Large Language Models
☆34Updated last year
Abonia1 / Fine-Tuning-LLMs-Key-Concepts-and-Terms
Fine-tuning large language models (LLMs) is crucial for enhancing performance across domain-specific task applications. This comprehensiv…
☆12Updated last year
muellerzr / nbdistributed
Seemless interface of using PyTOrch distributed with Jupyter notebooks
☆56Updated 2 months ago
davanstrien / data-for-fine-tuning-llms
☆80Updated last year
ibm-self-serve-assets / SuperKnowa
Build Enterprise RAG (Retriver Augmented Generation) Pipelines to tackle various Generative AI use cases with LLM's by simply plugging co…
☆115Updated last year
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated last year
hamelsmu / llama-inference
experiments with inference on llama
☆103Updated last year
rashmimarganiatgithub / LLMS_Library_2023
LLM_library is a comprehensive repository serves as a one-stop resource hands-on code, insightful summaries.
☆69Updated last year
explodinggradients / Funtuner
Supervised instruction finetuning for LLM with HF trainer and Deepspeed
☆36Updated 2 years ago
mrmaheshrajput / productionizing-llms
Code Repository for Blog - How to Productionize Large Language Models (LLMs)
☆12Updated last year
microsoft / llm-data-creation
Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"
☆137Updated 2 years ago
apple / ml-superposition-prompting
☆146Updated last year
evidentlyai / community-examples
Examples of using Evidently to evaluate, test and monitor ML models.
☆43Updated 2 months ago
hkproj / dpo-notes
Notes on Direct Preference Optimization
☆23Updated last year
Upaya07 / NeurIPS-llm-efficiency-challenge
Code for NeurIPS LLM Efficiency Challenge
☆59Updated last year
HumanSignal / RLHF
Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI m…
☆224Updated 2 years ago
NielsRogge / awesome-huggingface
Repository containing awesome resources regarding Hugging Face tooling.
☆48Updated last year
ali-bahrainian / RAG_best_practices
☆98Updated 8 months ago
CVxTz / llm-serve-tutorial
☆20Updated last year
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆277Updated last year
oceanumeric / EnteRAG
A RAG that can scale 🧑🏻‍💻
☆11Updated last year
AhmedSSoliman / Llama2-CodeGen-Fine-Tuning-LLama-2
☆15Updated 2 years ago
patronus-ai / Lynx-hallucination-detection
☆43Updated last year
philschmid / evaluate-llms
Includes examples on how to evaluate LLMs
☆23Updated last year