hpcaitech / EnergonAI
Large-scale model inference.
☆629Updated last year
Alternatives and similar repositories for EnergonAI:
Users that are interested in EnergonAI are comparing it to the libraries listed below
- Fast Inference Solutions for BLOOM☆561Updated 6 months ago
- Efficient Training (including pre-training and fine-tuning) for Big Models☆584Updated this week
- Examples of training models with hybrid parallelism using ColossalAI☆339Updated 2 years ago
- ☆411Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,386Updated last year
- ☆543Updated 4 months ago
- Scalable PaLM implementation of PyTorch☆190Updated 2 years ago
- ☆214Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆472Updated last year
- LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training☆402Updated this week
- Efficient AI Inference & Serving☆471Updated last year
- Microsoft Automatic Mixed Precision Library☆593Updated 6 months ago
- Official repository for LongChat and LongEval☆519Updated 11 months ago
- Running BERT without Padding☆471Updated 3 years ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆992Updated 8 months ago
- GPTQ inference Triton kernel☆300Updated last year
- LOMO: LOw-Memory Optimization☆985Updated 9 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,002Updated last month
- PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.☆758Updated 2 years ago
- Serving multiple LoRA finetuned LLM as one☆1,054Updated 11 months ago
- Ongoing research training transformer models at scale☆386Updated 8 months ago
- Efficient Inference for Big Models☆581Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆649Updated last year
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆218Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,242Updated last month
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,055Updated last month
- LLaMa/RWKV onnx models, quantization and testcase☆361Updated last year
- Crosslingual Generalization through Multitask Finetuning☆531Updated 7 months ago
- Code used for sourcing and cleaning the BigScience ROOTS corpus☆309Updated 2 years ago
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆267Updated 2 years ago