predibase / lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆2,802Updated 2 weeks ago
Alternatives and similar repositories for lorax:
Users that are interested in lorax are comparing it to the libraries listed below
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,568Updated this week
- Tools for merging pretrained large language models.☆5,458Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,312Updated this week
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,742Updated 3 weeks ago
- Go ahead and axolotl questions☆8,928Updated this week
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,801Updated last year
- Minimalistic large language model 3D-parallelism training☆1,701Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,328Updated last month
- ☆2,889Updated 6 months ago
- PyTorch native post-training library☆5,014Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,313Updated this week
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,445Updated last month
- AllenAI's post-training codebase☆2,827Updated this week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,957Updated last week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,021Updated 2 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,762Updated 7 months ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆2,271Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,053Updated last week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,103Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,338Updated last month
- Efficient Retrieval Augmentation and Generation Framework☆1,489Updated 2 months ago
- Optimizing inference proxy for LLMs☆2,110Updated this week
- A blazing fast inference solution for text embeddings models☆3,321Updated this week
- Robust recipes to align language models with human and AI preferences☆5,072Updated 4 months ago
- Large-scale LLM inference engine☆1,355Updated this week
- Curated list of datasets and tools for post-training.☆2,844Updated last month
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆1,924Updated last week
- A framework for few-shot evaluation of language models.☆8,337Updated this week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,990Updated this week
- A library for advanced large language model reasoning☆2,060Updated last month