Yard1 / Ray-DeepSpeed-InferenceLinks
☆17Updated 2 years ago
Alternatives and similar repositories for Ray-DeepSpeed-Inference
Users that are interested in Ray-DeepSpeed-Inference are comparing it to the libraries listed below
Sorting:
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated last year
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆72Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆248Updated last year
- Comparison of Language Model Inference Engines☆229Updated 9 months ago
- Official repository for LongChat and LongEval☆533Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆215Updated last year
- Open Source WizardCoder Dataset☆161Updated 2 years ago
- ☆56Updated 10 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆149Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆83Updated this week
- A high-performance inference system for large language models, designed for production environments.☆467Updated last week
- Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.☆397Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆249Updated 11 months ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆97Updated last year
- Benchmark baseline for retrieval qa applications☆116Updated last year
- ☆298Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 11 months ago
- Easy and Efficient Quantization for Transformers☆203Updated 3 months ago
- batched loras☆346Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆209Updated 3 weeks ago
- LLM Inference benchmark☆426Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆473Updated last year
- ☆121Updated last year
- ☆199Updated 4 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆56Updated 10 months ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆103Updated last year
- Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in datase…☆53Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Updated 2 years ago
- ☆275Updated 2 years ago