Yard1 / Ray-DeepSpeed-InferenceLinks
☆17Updated 2 years ago
Alternatives and similar repositories for Ray-DeepSpeed-Inference
Users that are interested in Ray-DeepSpeed-Inference are comparing it to the libraries listed below
Sorting:
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆70Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated last year
- Official repository for LongChat and LongEval☆527Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆246Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆213Updated last year
- Comparison of Language Model Inference Engines☆228Updated 8 months ago
- Benchmark suite for LLMs from Fireworks.ai☆79Updated 3 weeks ago
- ☆55Updated 9 months ago
- ☆195Updated 3 months ago
- A high-performance inference system for large language models, designed for production environments.☆460Updated 3 weeks ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆149Updated last year
- batched loras☆345Updated last year
- LLM Inference benchmark☆426Updated last year
- Open Source WizardCoder Dataset☆160Updated 2 years ago
- Efficient AI Inference & Serving☆473Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆246Updated 9 months ago
- Light local website for displaying performances from different chat models.☆87Updated last year
- ☆120Updated last year
- Experiments on speculative sampling with Llama models☆128Updated 2 years ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆470Updated last year
- Instruct-tune LLaMA on consumer hardware☆42Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Updated last year
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆55Updated 9 months ago
- ☆21Updated 2 years ago
- ☆289Updated 2 weeks ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆102Updated last year
- ☆271Updated 2 years ago
- ☆124Updated last year
- [ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark☆389Updated last year
- Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.☆394Updated last year