wangkuiyi / huggingface-tokenizer-in-cxxLinks

☆68

Alternatives and similar repositories for huggingface-tokenizer-in-cxx

Users that are interested in huggingface-tokenizer-in-cxx are comparing it to the libraries listed below

Sorting:

mlc-ai / tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
☆367Updated last week
daquexian / faster-rwkv
☆124Updated last year
yvonwin / qwen2.cpp
qwen2 and llama3 cpp implementation
☆45Updated last year
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆362Updated 2 years ago
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆175Updated 4 months ago
Peter-Chou / transformer_cpp_tokenizers
transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)
☆17Updated 3 years ago
wangzhaode / onnx-llm
llm deploy project based onnx.
☆42Updated 9 months ago
dhpollack / huggingface_libtorch
Minimal example of using a traced huggingface transformers model with libtorch
☆35Updated 4 years ago
MollySophia / rwkv-qualcomm
Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK
☆77Updated 3 weeks ago
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
neuralmagic / AutoFP8
☆195Updated 3 months ago
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆156Updated this week
mlc-ai / llm-perf-bench
☆120Updated last year
casper-hansen / AutoAWQ_kernels
☆76Updated 8 months ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆209Updated last year
AniZpZ / smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆11Updated last year
LeeJuly30 / BERTCpp
implement bert in pure c++
☆35Updated 5 years ago
triton-inference-server / tensorrt_backend
The Triton backend for TensorRT.
☆77Updated last week
inisis / OnnxSlim
A Toolkit to Help Optimize Onnx Model
☆188Updated this week
OpenBMB / cpm_kernels
☆24Updated last year
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆198Updated last month
Adlik / smoothquantplus
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆23Updated last year
triton-inference-server / common
Common source, scripts and utilities shared across all Triton repositories.
☆75Updated this week
leimao / Nsight-Compute-Docker-Image
Nsight Compute In Docker
☆12Updated last year
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆157Updated last year
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆289Updated last year
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated last year
OpenPPL / ppl.llm.serving
☆128Updated 7 months ago
speechmatics / ctranslate2_triton_backend
Triton backend for https://github.com/OpenNMT/CTranslate2
☆35Updated 2 years ago
DeepLink-org / ditorch
☆23Updated 6 months ago