void-main / fastertransformer_backend
☆21Updated last year
Related projects ⓘ
Alternatives and complementary repositories for fastertransformer_backend
- Transformer related optimization, including BERT, GPT☆60Updated last year
- Transformer related optimization, including BERT, GPT☆39Updated last year
- ☆123Updated 2 weeks ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Updated last year
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆90Updated 9 months ago
- ☆140Updated 6 months ago
- A LLaMA1/LLaMA12 Megatron implement.☆27Updated 11 months ago
- ☆111Updated 8 months ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- ☆64Updated 3 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆206Updated this week
- An easy-to-use package for implementing SmoothQuant for LLMs☆83Updated 6 months ago
- ☆82Updated last year
- ☆157Updated last month
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.☆149Updated last month
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆45Updated 3 months ago
- ☆191Updated this week
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆57Updated last year
- Latency and Memory Analysis of Transformer Models for Training and Inference☆355Updated last week
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆75Updated 8 months ago
- ☆289Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆359Updated 3 months ago
- ☆74Updated 11 months ago
- Materials for learning SGLang☆96Updated this week
- LLM Inference benchmark☆350Updated 3 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆169Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆68Updated 4 months ago
- A low-latency & high-throughput serving engine for LLMs☆245Updated 2 months ago
- export llama to onnx☆96Updated 5 months ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆96Updated 8 months ago