predibase / loraxLinks
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆3,165Updated last month
Alternatives and similar repositories for lorax
Users that are interested in lorax are comparing it to the libraries listed below
Sorting:
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,788Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,434Updated last week
- Tools for merging pretrained large language models.☆5,937Updated 2 weeks ago
- Minimalistic large language model 3D-parallelism training☆1,956Updated last week
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,838Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,670Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,550Updated last month
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,271Updated last week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,200Updated last month
- PyTorch native post-training library☆5,296Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,587Updated this week
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,831Updated 4 months ago
- Robust recipes to align language models with human and AI preferences☆5,241Updated 2 months ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,175Updated last week
- A blazing fast inference solution for text embeddings models☆3,758Updated this week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆1,448Updated 5 months ago
- ☆2,977Updated 9 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,912Updated 10 months ago
- Knowledge Agents and Management in the Cloud☆4,035Updated this week
- Optimizing inference proxy for LLMs☆2,589Updated this week
- ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)☆3,460Updated 2 months ago
- AllenAI's post-training codebase☆3,033Updated this week
- Synthetic data curation for post-training and structured data extraction☆1,419Updated last week
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,113Updated 3 weeks ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,201Updated 3 months ago
- Go ahead and axolotl questions☆9,810Updated this week
- Serving multiple LoRA finetuned LLM as one☆1,067Updated last year
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆6,614Updated this week
- Efficient Retrieval Augmentation and Generation Framework☆1,580Updated 5 months ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,507Updated last year