bentoml / llm-bench
☆33Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for llm-bench
- ☆109Updated 7 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆101Updated last week
- Benchmark suite for LLMs from Fireworks.ai☆58Updated this week
- ☆156Updated last month
- ☆44Updated last month
- Materials for learning SGLang☆75Updated this week
- LLM Serving Performance Evaluation Harness☆54Updated 2 months ago
- ☆189Updated this week
- Easy and Efficient Quantization for Transformers☆178Updated 3 months ago
- A low-latency & high-throughput serving engine for LLMs☆231Updated last month
- Ultra-Fast and Cheaper Long-Context LLM Inference☆194Updated this week
- Comparison of Language Model Inference Engines☆189Updated 2 months ago
- Modular and structured prompt caching for low-latency LLM inference☆65Updated 5 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆174Updated 3 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆51Updated 2 months ago
- ☆114Updated 6 months ago
- Applied AI experiments and examples for PyTorch☆159Updated last week
- A large-scale simulation framework for LLM inference☆271Updated last month
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆236Updated 7 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆222Updated this week
- Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.☆118Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆251Updated this week
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆74Updated 7 months ago
- LLM Inference benchmark☆349Updated 3 months ago
- A throughput-oriented high-performance serving framework for LLMs☆629Updated last month
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆163Updated this week
- Latency and Memory Analysis of Transformer Models for Training and Inference☆352Updated 5 months ago
- experiments with inference on llama☆105Updated 5 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆89Updated last year
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆85Updated 2 months ago