☆72Feb 27, 2023Updated 3 years ago
Alternatives and similar repositories for huggingface-tokenizer-in-cxx
Users that are interested in huggingface-tokenizer-in-cxx are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Universal cross-platform tokenizers binding to HF and sentencepiece☆488Feb 20, 2026Updated 3 months ago
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆18Apr 7, 2022Updated 4 years ago
- Try to export the ONNX QDQ model that conforms to the AXERA NPU quantization specification. Currently, only w8a8 is supported.☆11Sep 10, 2024Updated last year
- HuggingFace Transformers WordPiece Tokenizer in C++☆21Mar 14, 2025Updated last year
- Minimal example of using a traced huggingface transformers model with libtorch☆35Sep 17, 2020Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Source code of our implementation of the concurrent RMA☆12May 23, 2019Updated 6 years ago
- A four-dimensional Analysis of Partitioned Approximate Filters☆11Aug 6, 2025Updated 9 months ago
- 基于 Sherpa-ONNX 实现在线下载模型的端侧实时语音识别应用(Implement speech recognition based on Sherpa-ONNX by downloading the model online.)☆29Feb 27, 2025Updated last year
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- A SQLite extension for working with float and binary vectors. Work in progress!☆24Feb 10, 2023Updated 3 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 9 months ago
- BERT Tokenizer in C++☆79Jan 14, 2021Updated 5 years ago
- Port of Facebook's LLaMA model in C/C++☆13May 9, 2026Updated last week
- implement bert in pure c++☆37Apr 29, 2020Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆26May 22, 2023Updated 2 years ago
- Grizzly: Efficient Stream Processing Through Adaptive Query Compilation☆16Jun 13, 2020Updated 5 years ago
- OneFlow Serving☆20Apr 10, 2025Updated last year
- A Android client of Stable Diffusion.☆13Mar 29, 2024Updated 2 years ago
- Run Chinese MobileBert model on SNPE.☆15May 19, 2023Updated 3 years ago
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Dec 4, 2023Updated 2 years ago
- Recording models☆12Sep 19, 2023Updated 2 years ago
- Fast and customizable text tokenization library with BPE and SentencePiece support☆333Jan 10, 2026Updated 4 months ago
- ☆16Jan 24, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Uses the excellent silero VAD with onnxruntime C api for fast detection of audio segments with speech☆16Sep 20, 2024Updated last year
- qwen2 and llama3 cpp implementation☆50Jun 7, 2024Updated last year
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated 2 years ago
- 単眼深度推定モデルのLite-MonoのPythonでのONNX推論サンプル☆23Apr 12, 2023Updated 3 years ago
- ☆16Mar 16, 2021Updated 5 years ago
- Code for our paper "Evaluating SIMD Compiler-Intrinsics for Database Systems"☆16Jul 5, 2023Updated 2 years ago
- Fast Cardinality Estimation of Multi-Join Queries Using Sketches☆16Feb 29, 2024Updated 2 years ago
- ☆34Apr 29, 2019Updated 7 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆43Oct 20, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆13Nov 27, 2025Updated 5 months ago
- ☆17Apr 30, 2025Updated last year
- TTG: Template Task Graph C++ API☆26May 9, 2026Updated last week
- Triton backend for https://github.com/OpenNMT/CTranslate2☆35Jul 7, 2023Updated 2 years ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆91Updated this week
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆42Jul 14, 2025Updated 10 months ago
- Parallel Wavelet Tree and Wavelet Matrix Construction☆25Jun 27, 2023Updated 2 years ago