hpcaitech / EnergonAILinks

Large-scale model inference.

☆631

Alternatives and similar repositories for EnergonAI

Users that are interested in EnergonAI are comparing it to the libraries listed below

Sorting:

triton-inference-server / fastertransformer_backend
☆413Updated last year
huggingface / transformers-bloom-inference
Fast Inference Solutions for BLOOM
☆565Updated last year
hpcaitech / ColossalAI-Examples
Examples of training models with hybrid parallelism using ColossalAI
☆339Updated 2 years ago
Vahe1994 / SpQR
☆546Updated 10 months ago
Oneflow-Inc / libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
☆407Updated 2 months ago
volcengine / veGiantModel
☆219Updated 2 years ago
hpcaitech / SwiftInfer
Efficient AI Inference & Serving
☆478Updated last year
Tencent / PatrickStar
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.
☆766Updated 2 years ago
OpenBMB / BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
☆611Updated last month
bigscience-workshop / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,420Updated last year
Azure / MS-AMP
Microsoft Automatic Mixed Precision Library
☆626Updated last year
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Updated last year
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆310Updated 2 years ago
bytedance / effective_transformer
Running BERT without Padding
☆475Updated 3 years ago
hpcaitech / PaLM-colossalai
Scalable PaLM implementation of PyTorch
☆188Updated 2 years ago
vectorch-ai / ScaleLLM
A high-performance inference system for large language models, designed for production environments.
☆479Updated last week
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,067Updated 3 months ago
DachengLi1 / LongChat
Official repository for LongChat and LongEval
☆532Updated last year
alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆660Updated last year
OpenBMB / BMInf
Efficient Inference for Big Models
☆588Updated 2 years ago
bigscience-workshop / bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
☆1,004Updated last year
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆367Updated 2 years ago
open-compass / MixtralKit
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
☆770Updated last year
THUDM / FasterTransformer
Transformer related optimization, including BERT, GPT
☆39Updated 2 years ago
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,101Updated last year
flexflow / flexflow-train
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,837Updated this week
BlackSamorez / tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
☆657Updated last year
OpenPPL / ppl.llm.serving
☆129Updated 9 months ago
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆704Updated last year
sambanova / bloomchat
This repo contains the data preparation, tokenization, training and inference code for BLOOMChat. BLOOMChat is a 176 billion parameter mu…
☆586Updated 2 years ago