hpcaitech / PaLM-colossalai
Scalable PaLM implementation of PyTorch
☆192Updated 2 years ago
Alternatives and similar repositories for PaLM-colossalai:
Users that are interested in PaLM-colossalai are comparing it to the libraries listed below
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆204Updated 5 months ago
- Performance benchmarking with ColossalAI☆39Updated 2 years ago
- ☆106Updated last year
- ☆114Updated 10 months ago
- GPTQ inference Triton kernel☆292Updated last year
- Fast Inference Solutions for BLOOM☆563Updated 3 months ago
- ☆96Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆102Updated 2 months ago
- Examples of training models with hybrid parallelism using ColossalAI☆337Updated last year
- Running BERT without Padding☆468Updated 2 years ago
- 📑 Dive into Big Model Training☆110Updated 2 years ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆190Updated last month
- A unified tokenization tool for Images, Chinese and English.☆151Updated last year
- ☆411Updated last year
- Code used for sourcing and cleaning the BigScience ROOTS corpus☆307Updated last year
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆86Updated 2 years ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆207Updated last year
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆58Updated last year
- ☆58Updated 8 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆86Updated 3 weeks ago
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆192Updated last year
- Techniques used to run BLOOM at inference in parallel☆37Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆213Updated last year
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"☆266Updated 4 months ago
- Zero Bubble Pipeline Parallelism☆317Updated 2 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆415Updated 3 weeks ago
- DSIR large-scale data selection framework for language model training☆242Updated 9 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Updated last year
- ☆97Updated 5 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆90Updated last year