friendliai / friendli-client
Friendli: the fastest serving engine for generative AI
☆43Updated 2 months ago
Alternatives and similar repositories for friendli-client:
Users that are interested in friendli-client are comparing it to the libraries listed below
- FMO (Friendli Model Optimizer)☆12Updated 2 months ago
- ☆45Updated 6 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 3 months ago
- FriendliAI Model Hub☆92Updated 2 years ago
- ☆22Updated this week
- Welcome to PeriFlow CLI ☁︎☆12Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆25Updated this week
- Sentence Embedding as a Service☆15Updated last year
- How much energy do GenAI models consume?☆42Updated 5 months ago
- SGLang is fast serving framework for large language models and vision language models.☆20Updated last month
- Dotfile management with bare git☆19Updated last week
- AI Agent who manages your Jira project☆16Updated 9 months ago
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆128Updated this week
- A collection of all available inference solutions for the LLMs☆82Updated last month
- Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.☆31Updated this week
- ☆45Updated 9 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 months ago
- ☆11Updated last month
- Cruise: A Distributed Machine Learning Framework with Automatic System Configuration☆26Updated 6 years ago
- Tiny configuration for Triton Inference Server☆45Updated 2 months ago
- ☆56Updated this week
- Evaluate your LLM apps, RAG pipeline, any generated text, and more!Updated 10 months ago
- 1-Click is all you need.☆59Updated 11 months ago
- Modular and structured prompt caching for low-latency LLM inference☆89Updated 4 months ago
- ☆102Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆111Updated 3 months ago
- ☆31Updated 4 months ago
- ☆17Updated last week
- LLM Serving Performance Evaluation Harness☆73Updated last month