ksm26 / Efficiently-Serving-LLMs
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
☆11Updated 11 months ago
Alternatives and similar repositories for Efficiently-Serving-LLMs:
Users that are interested in Efficiently-Serving-LLMs are comparing it to the libraries listed below
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆34Updated last year
- ☆20Updated 3 years ago
- ☆16Updated last year
- purpose of this repo is to Implement LLMOPs as shared in Deeplearning AI course☆10Updated this week
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 8 months ago
- ☆24Updated last year
- Running load tests on a FastAPI application using Locust☆13Updated this week
- Codebase accompanying the Summary of a Haystack paper.☆76Updated 6 months ago
- ☆76Updated 9 months ago
- meta_llama_2finetuned_text_generation_summarization☆21Updated last year
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆11Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated 3 months ago
- ☆20Updated 11 months ago
- A RAG that can scale 🧑🏻💻☆11Updated 10 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆29Updated 7 months ago
- ☆42Updated 5 months ago
- PyTorch implementation for MRL☆18Updated last year
- Includes examples on how to evaluate LLMs☆22Updated 4 months ago
- Writing Blog Posts with Generative Feedback Loops!☆47Updated last year
- ☆19Updated 5 months ago
- A Chainlit App Used to Showcase: Async, Caching, Additional Chainlit Methods, and more!☆11Updated 6 months ago
- End-to-End LLM Guide☆104Updated 8 months ago
- Using short models to classify long texts☆21Updated 2 years ago
- Experimentation on google's gemma model☆16Updated last year
- ☆41Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆59Updated last year
- Sample notebooks and prompts for LLM evaluation☆124Updated 4 months ago
- 🤖 AI Assistant fine-tuned to provide support for coding and design questions based on the latest trends in the industry.☆16Updated last year
- Fine-tuning large language models (LLMs) is crucial for enhancing performance across domain-specific task applications. This comprehensiv…☆12Updated 6 months ago