predibase / loraxLinks

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

☆3,518

Alternatives and similar repositories for lorax

Users that are interested in lorax are comparing it to the libraries listed below

Sorting:

argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆2,912Updated this week
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,864Updated last year
IntelLabs / fastRAG
Efficient Retrieval Augmentation and Generation Framework
☆1,736Updated 9 months ago
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,687Updated 2 weeks ago
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆2,149Updated this week
arcee-ai / mergekit
Tools for merging pretrained large language models.
☆6,394Updated last month
noamgat / lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
☆1,942Updated 2 months ago
huggingface / lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆2,021Updated last week
michaelfeil / infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
☆2,513Updated 3 weeks ago
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆2,274Updated last month
AnswerDotAI / RAGatouille
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,723Updated 5 months ago
mistralai / mistral-finetune
☆3,037Updated last year
meta-pytorch / torchtune
PyTorch native post-training library
☆5,547Updated last week
microsoft / LLMLingua
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…
☆5,520Updated this week
mlabonne / llm-datasets
Curated list of datasets and tools for post-training.
☆3,810Updated 3 months ago
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,260Updated 5 months ago
lm-sys / RouteLLM
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
☆4,365Updated last year
gkamradt / LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
☆2,060Updated last year
axolotl-ai-cloud / axolotl
Go ahead and axolotl questions
☆10,673Updated this week
ray-project / llmperf
LLMPerf is a library for validating and benchmarking LLMs
☆1,032Updated 10 months ago
AnswerDotAI / rerankers
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆1,562Updated 5 months ago
allenai / open-instruct
AllenAI's post-training codebase
☆3,263Updated last week
MeetKai / functionary
Chat language model that can use tools and interpret the results
☆1,586Updated this week
FasterDecoding / Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,646Updated last year
NVIDIA-NeMo / Curator
Scalable data pre processing and curation toolkit for LLMs
☆1,188Updated this week
bespokelabsai / curator
Synthetic data curation for post-training and structured data extraction
☆1,537Updated 3 months ago
codelion / optillm
Optimizing inference proxy for LLMs
☆3,042Updated 2 weeks ago
huggingface / text-embeddings-inference
A blazing fast inference solution for text embeddings models
☆4,131Updated 3 weeks ago
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,876Updated last year
zou-group / textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.
☆3,027Updated 3 months ago