Yard1 / Ray-DeepSpeed-InferenceLinks
☆17Updated 2 years ago
Alternatives and similar repositories for Ray-DeepSpeed-Inference
Users that are interested in Ray-DeepSpeed-Inference are comparing it to the libraries listed below
Sorting:
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆68Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆243Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆147Updated 11 months ago
- Official repository for LongChat and LongEval☆523Updated last year
- Comparison of Language Model Inference Engines☆219Updated 7 months ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆95Updated last year
- Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.☆387Updated last year
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated last year
- CodeLLaMA 中文版 - 代码生成助手,huggingface累积下载2w+次☆45Updated last year
- ☆270Updated 2 years ago
- Benchmark suite for LLMs from Fireworks.ai☆76Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated 11 months ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆223Updated last year
- Experiments on speculative sampling with Llama models☆128Updated 2 years ago
- ☆55Updated 7 months ago
- fastertransformer for codegeex model☆63Updated 2 years ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆99Updated 11 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆54Updated 8 months ago
- Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in datase…☆53Updated 2 years ago
- Benchmark baseline for retrieval qa applications☆115Updated last year
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆226Updated 10 months ago
- Open Source WizardCoder Dataset☆159Updated 2 years ago
- vLLM Router☆31Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆244Updated 8 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆98Updated last year
- batched loras☆344Updated last year
- An open source ChatGPT UI for ToolLlama☆28Updated last year
- The data processing pipeline for the Koala chatbot language model☆117Updated 2 years ago
- Evaluation for AI apps and agent☆42Updated last year