kemingy / vllm-envLinks
setup the env for vllm users
☆16Updated 2 years ago
Alternatives and similar repositories for vllm-env
Users that are interested in vllm-env are comparing it to the libraries listed below
Sorting:
- A collection of reproducible inference engine benchmarks☆38Updated 7 months ago
- A memory efficient DLRM training solution using ColossalAI☆106Updated 3 years ago
- Evaluation for AI apps and agent☆43Updated last year
- Deploy ChatGLM on Modelz☆16Updated 2 years ago
- Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)☆278Updated 2 years ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆78Updated last year
- Evaluation of bm42 sparse indexing algorithm☆72Updated last year
- Sentence Embedding as a Service☆15Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆84Updated last week
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated 2 years ago
- ☆28Updated 2 years ago
- ☆85Updated 2 years ago
- ☆16Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Updated 2 months ago
- ☆90Updated last year
- OVALChat is a customizable Web app aimed at conducting user studies with chatbots☆28Updated last year
- Framework for benchmarking fully-managed vector databases☆80Updated last year
- fastertransformer for codegeex model☆65Updated 2 years ago
- ☆56Updated last year
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆106Updated last year
- Benchmarking suite for popular AI APIs☆88Updated 9 months ago
- Data preparation code for Amber 7B LLM☆93Updated last year
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆88Updated 10 months ago
- Sky Computing: Accelerating Geo-distributed Computing in Federated Learning☆91Updated 3 years ago
- Open Implementations of LLM Analyses☆107Updated last year
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆22Updated 6 months ago
- The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective☆62Updated 3 years ago
- Implementation of nougat that focuses on processing pdf locally.☆83Updated 10 months ago
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆110Updated last year
- experiments with inference on llama☆103Updated last year