wangkuiyi / huggingface-tokenizer-in-cxx
☆57Updated last year
Alternatives and similar repositories for huggingface-tokenizer-in-cxx:
Users that are interested in huggingface-tokenizer-in-cxx are comparing it to the libraries listed below
- ☆124Updated last year
- Universal cross-platform tokenizers binding to HF and sentencepiece☆305Updated 2 weeks ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆159Updated last week
- The Triton backend for TensorRT.☆69Updated this week
- llm deploy project based onnx.☆32Updated 4 months ago
- ☆117Updated 9 months ago
- qwen2 and llama3 cpp implementation☆40Updated 8 months ago
- Whisper in TensorRT-LLM☆15Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆20Updated 11 months ago
- The Triton backend for the ONNX Runtime.☆138Updated this week
- ☆172Updated 4 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆104Updated 5 months ago
- Transformer related optimization, including BERT, GPT☆59Updated last year
- ☆65Updated 2 months ago
- implement bert in pure c++☆36Updated 4 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆35Updated 5 months ago
- A Toolkit to Help Optimize Onnx Model☆114Updated 3 weeks ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- Triton backend for https://github.com/OpenNMT/CTranslate2☆34Updated last year
- ☆23Updated last year
- Inference RWKV v5, v6 and (WIP) v7 with Qualcomm AI Engine Direct SDK☆52Updated this week
- ☆127Updated last month
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆44Updated last year
- simplify >2GB large onnx model☆52Updated 2 months ago
- ☆67Updated 2 months ago
- ☆140Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Updated 8 months ago
- An easy-to-use package for implementing SmoothQuant for LLMs☆92Updated 9 months ago
- export llama to onnx☆112Updated last month
- ☆117Updated 11 months ago