Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
☆19Apr 12, 2024Updated 2 years ago
Alternatives and similar repositories for Efficiently-Serving-LLMs
Users that are interested in Efficiently-Serving-LLMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A monolingual parallel corpus for sentence simplification☆11Jul 4, 2016Updated 9 years ago
- This is the code for neural-Jacana aligner, and the data for MultiMWA dataset.☆20Feb 12, 2023Updated 3 years ago
- A set of tips and tricks to assist in the Certified Kubernetes Application Developer exam by Cloud Native Computing Foundation.☆93May 15, 2026Updated last month
- Deploy SageMaker models with Terraform☆23Feb 14, 2018Updated 8 years ago
- Deploy, launch and use LLMs on AWS☆16Jun 2, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Klexikon: A German Dataset for Joint Summarization and Simplification☆16Oct 5, 2022Updated 3 years ago
- Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inferen…☆19Sep 5, 2023Updated 2 years ago
- ☆10Aug 18, 2021Updated 4 years ago
- ☆25May 9, 2022Updated 4 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- Constrained decoding utilities for text generation using Huggingface seq2seq models☆25Jan 25, 2023Updated 3 years ago
- Machine Learning for Mathematics Faculty (HSE) 2018☆18Jan 23, 2022Updated 4 years ago
- oneNeuron Pytroch basics course docs plus code☆10Mar 14, 2022Updated 4 years ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of the G-CORE graph query language on Spark☆15Aug 25, 2021Updated 4 years ago
- ☆14Oct 11, 2023Updated 2 years ago
- The Chia Network Nebula Graph database Importer☆11Jan 16, 2023Updated 3 years ago
- Ideas on how to quickly learn to build command-line tools☆11Feb 26, 2022Updated 4 years ago
- A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning☆16Jul 20, 2020Updated 5 years ago
- Super Mario is a legendary game we all cherish! In this project, we will deploy Super Mario on Amazon EKS (Elastic Kubernetes Service) us…☆11Feb 3, 2026Updated 4 months ago
- Repository for React Fundamentals classroom demonstration contacts app☆11Nov 19, 2024Updated last year
- Community detection in complex networks using hybrid quantum annealing on Amazon Braket☆13Jul 6, 2023Updated 2 years ago
- Official code of our work, VCSR: Mutable CSR Graph Format Using Vertex-Centric Packed Memory Array [CCGrid 2022].☆13Jun 30, 2022Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks☆15Feb 17, 2025Updated last year
- A Java library for quantum programming using Quil.☆16Jul 23, 2018Updated 7 years ago
- Resources to learn data processing with GPT and other language models☆21Dec 10, 2024Updated last year
- For my IBM Data Science Professional certificate capstone project in early 2020, I used pandas, the FourSquare API, Folium, and other Pyt…☆13Dec 31, 2020Updated 5 years ago
- You’ll explore new advancements like ChatGPT’s function calling capability, and build a conversational agent using a new syntax called La…☆16Oct 28, 2023Updated 2 years ago
- This repository will take you through creating a FastAPI StableDiffusion app (including Dockerfile) all the way to adding a new feature u…☆38Nov 9, 2022Updated 3 years ago
- A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal…☆12Sep 16, 2024Updated last year
- Unsupervised Neural Text Simplification☆31Apr 14, 2021Updated 5 years ago
- Generate a dataset to finetune a LLM to generate Cypher code from questions given in natural language (English).☆15May 24, 2024Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Packed Memory Array☆17May 14, 2014Updated 12 years ago
- Extracts parallel corpora from the 2 raw texts in different languages.☆37Nov 4, 2022Updated 3 years ago
- Analysis using reduced NanoAOD files created from CMS open data producing a high statistics di-muon spectrum☆15Sep 5, 2023Updated 2 years ago
- data related codebase for polyglot project☆19Mar 30, 2023Updated 3 years ago
- Experimentation on google's gemma model☆16Mar 6, 2024Updated 2 years ago
- CypherSmith is a random cypher generator for OpenCypher☆17Jan 24, 2022Updated 4 years ago
- A simple example to showcase machine learning model deployment with an API☆10Mar 7, 2022Updated 4 years ago