OpenBMB / cpm_kernels
☆23Updated last year
Alternatives and similar repositories for cpm_kernels:
Users that are interested in cpm_kernels are comparing it to the libraries listed below
- ☆76Updated last year
- Transformer related optimization, including BERT, GPT☆39Updated last year
- Transformer related optimization, including BERT, GPT☆17Updated last year
- A more efficient GLM implementation!☆55Updated last year
- ☆18Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Updated 7 months ago
- Simple Dynamic Batching Inference☆145Updated 2 years ago
- This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…☆96Updated 10 months ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- Transformer related optimization, including BERT, GPT☆59Updated last year
- Summary of system papers/frameworks/codes/tools on training or serving large model☆56Updated last year
- ☆127Updated 3 weeks ago
- OneFlow Serving☆20Updated 3 weeks ago
- ☆13Updated 9 months ago
- Another ChatGLM2 implementation for GPTQ quantization☆54Updated last year
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆45Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆34Updated 4 months ago
- ☆33Updated this week
- ☆114Updated 10 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆99Updated 4 months ago
- Models and examples built with OneFlow☆96Updated 3 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆88Updated 10 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆30Updated 2 months ago
- Datasets, Transforms and Models specific to Computer Vision☆84Updated last year
- simplify >2GB large onnx model☆51Updated last month
- ☆57Updated 7 months ago
- ☆140Updated 8 months ago
- ☆21Updated last year
- A unified tokenization tool for Images, Chinese and English.☆151Updated last year
- implement bert in pure c++☆36Updated 4 years ago