friendliai / friendli-client
Friendli: the fastest serving engine for generative AI
☆44Updated 3 months ago
Alternatives and similar repositories for friendli-client
Users that are interested in friendli-client are comparing it to the libraries listed below
Sorting:
- FMO (Friendli Model Optimizer)☆12Updated 4 months ago
- ☆46Updated 8 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 5 months ago
- FriendliAI Model Hub☆92Updated 2 years ago
- Tutorial to get started with SkyPilot!☆57Updated last year
- Sentence Embedding as a Service☆15Updated last year
- Accelerated inference of 🤗 models using FuriosaAI NPU chips.☆26Updated 11 months ago
- Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.☆41Updated this week
- ☆60Updated last month
- Welcome to PeriFlow CLI ☁︎☆12Updated last year
- ☆31Updated last year
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆130Updated this week
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆22Updated 2 months ago
- A collection of all available inference solutions for the LLMs☆87Updated 2 months ago
- A collection of reproducible inference engine benchmarks☆30Updated 3 weeks ago
- Tools for formatting large language model prompts.☆13Updated last year
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆29Updated this week
- ☆25Updated 3 weeks ago
- ☆24Updated this week
- ☆15Updated last month
- AI-based search engine done right☆16Updated this week
- ☆19Updated last month
- Command-line script for inferencing from models such as LLaMA, in a chat scenario, with LoRA adaptations☆33Updated last year
- ☆32Updated this week
- SGLang is fast serving framework for large language models and vision language models.☆23Updated 3 months ago
- ☆45Updated 10 months ago
- Tiny configuration for Triton Inference Server☆45Updated 4 months ago
- Creating Generative AI Apps which work☆17Updated last month
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 5 months ago