friendliai / friendli-model-optimizerLinks
FMO (Friendli Model Optimizer)
☆13Updated last year
Alternatives and similar repositories for friendli-model-optimizer
Users that are interested in friendli-model-optimizer are comparing it to the libraries listed below
Sorting:
- ☆48Updated last year
- [⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI☆49Updated 7 months ago
- Welcome to PeriFlow CLI ☁︎☆12Updated 2 years ago
- FriendliAI Model Hub☆90Updated 3 years ago
- vLLM plugin for RBLN NPU☆41Updated this week
- ☆24Updated 7 years ago
- A performance library for machine learning applications.☆184Updated 2 years ago
- ☆61Updated last month
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆119Updated last year
- Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.☆55Updated 6 months ago
- ☆103Updated 2 years ago
- ☆56Updated last year
- ☆26Updated 3 years ago
- ☆15Updated 4 years ago
- Easy and Efficient Quantization for Transformers☆202Updated 7 months ago
- Lightweight and Parallel Deep Learning Framework☆263Updated 3 years ago
- ☆81Updated 8 months ago
- ☆27Updated 2 years ago
- PyTorch CoreSIG☆57Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆457Updated 8 months ago
- MIST: High-performance IoT Stream Processing☆18Updated 6 years ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆148Updated 2 months ago
- Study Group of Deep Learning Compiler☆166Updated 3 years ago
- ☆19Updated last year
- ☆90Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆93Updated last year
- Example code for RBLN SDK developers building inference applications☆30Updated this week
- ☆12Updated last year
- A low-latency & high-throughput serving engine for LLMs☆470Updated 3 weeks ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆350Updated 9 months ago