predibase / loraxLinks
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆3,417Updated 3 months ago
Alternatives and similar repositories for lorax
Users that are interested in lorax are comparing it to the libraries listed below
Sorting:
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,878Updated last week
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,918Updated 3 weeks ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,434Updated last week
- Tools for merging pretrained large language models.☆6,275Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,616Updated 3 weeks ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,853Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,662Updated 4 months ago
- Go ahead and axolotl questions☆10,405Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,535Updated 3 months ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,950Updated this week
- Minimalistic large language model 3D-parallelism training☆2,191Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,890Updated this week
- Efficient Retrieval Augmentation and Generation Framework☆1,662Updated 8 months ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,371Updated 2 weeks ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,274Updated last year
- Large-scale LLM inference engine☆1,548Updated this week
- Optimizing inference proxy for LLMs☆2,863Updated this week
- ☆3,016Updated last year
- The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.☆3,551Updated this week
- A blazing fast inference solution for text embeddings models☆4,014Updated this week
- Scalable data pre processing and curation toolkit for LLMs☆1,145Updated this week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆1,568Updated 8 months ago
- Curated list of datasets and tools for post-training.☆3,694Updated last month
- PyTorch native post-training library☆5,484Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,247Updated 4 months ago
- ☆1,976Updated this week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,423Updated 6 months ago
- ☆1,077Updated last year
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,512Updated 7 months ago
- Tool for generating high quality Synthetic datasets☆1,183Updated last month