predibase / lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆2,802Updated 2 weeks ago
Alternatives and similar repositories for lorax:
Users that are interested in lorax are comparing it to the libraries listed below
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,568Updated this week
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,736Updated 3 weeks ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,312Updated this week
- Tools for merging pretrained large language models.☆5,458Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,801Updated last year
- PyTorch native post-training library☆5,014Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,321Updated last month
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,313Updated this week
- Minimalistic large language model 3D-parallelism training☆1,701Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,021Updated 2 weeks ago
- AllenAI's post-training codebase☆2,827Updated this week
- Curated list of datasets and tools for post-training.☆2,844Updated last month
- ☆2,889Updated 6 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,990Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,762Updated 7 months ago
- A blazing fast inference solution for text embeddings models☆3,321Updated this week
- Efficient Retrieval Augmentation and Generation Framework☆1,489Updated 2 months ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆1,924Updated this week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆1,882Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆2,141Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,729Updated 7 months ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,103Updated this week
- Large-scale LLM inference engine☆1,342Updated this week
- Optimizing inference proxy for LLMs☆2,110Updated this week
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤☆990Updated last month
- ☆832Updated 6 months ago
- Serving multiple LoRA finetuned LLM as one☆1,040Updated 10 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,338Updated last month
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,445Updated last month
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,852Updated last year