punica-ai / punicaView external linksLinks
Serving multiple LoRA finetuned LLM as one
☆1,139May 8, 2024Updated last year
Alternatives and similar repositories for punica
Users that are interested in punica are comparing it to the libraries listed below
Sorting:
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,897Jan 21, 2024Updated 2 years ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,719May 21, 2025Updated 8 months ago
- FlashInfer: Kernel Library for LLM Serving☆4,983Updated this week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,896Updated this week
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,860Updated this week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.