gf712 / gpt2-cppLinks

GPT2 implementation in C++ using Ort

☆26

Alternatives and similar repositories for gpt2-cpp

Users that are interested in gpt2-cpp are comparing it to the libraries listed below

Sorting:

kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆107Updated last year
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
leloykun / llama2.cpp
Inference Llama 2 in one file of pure C++
☆84Updated 2 years ago
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆313Updated last year
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆295Updated last year
nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …
☆52Updated 8 months ago
daquexian / faster-rwkv
☆124Updated last year
GaoYusong / llm.cpp
A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.
☆38Updated last year
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆193Updated last year
abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
byroneverson / llm.cpp
Fork of llama.cpp, extended for GPT-NeoX, RWKV-v4, and Falcon models
☆28Updated 2 years ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆107Updated 9 months ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆55Updated last year
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated last year
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆193Updated last month
wangkuiyi / huggingface-tokenizer-in-cxx
☆69Updated 2 years ago
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆74Updated 9 months ago
mscheong01 / speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
☆25Updated last year
MollySophia / rwkv-mobile
Inference RWKV with multiple supported backends.
☆65Updated this week
99991 / pygguf
GGUF parser in Python
☆28Updated last year
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆34Updated last year
KevlarKanou / rwkv7.c
Inference RWKV v7 in pure C.
☆40Updated 3 weeks ago
PABannier / encodec.cpp
Port of Meta's Encodec in C/C++
☆223Updated 10 months ago
keith2018 / TinyGPT
Tiny C++ LLM inference implementation from scratch
☆88Updated last month
okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆38Updated last year
skeskinen / llama-lite
Embeddings focused small version of Llama NLP model
☆105Updated 2 years ago
syoyo / safetensors-cpp
Header-only safetensors loader and saver in C++
☆69Updated 5 months ago
RWKV / rwkv-onnx
A converter and basic tester for rwkv onnx
☆42Updated last year
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆75Updated last year