gf712 / gpt2-cppLinks
GPT2 implementation in C++ using Ort
☆26Updated 4 years ago
Alternatives and similar repositories for gpt2-cpp
Users that are interested in gpt2-cpp are comparing it to the libraries listed below
Sorting:
- Experiments with BitNet inference on CPU☆55Updated last year
- LLM training in simple, raw C/CUDA☆99Updated last year
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆48Updated 3 months ago
- ☆124Updated last year
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆55Updated last year
- Inference Llama 2 in one file of pure C++☆83Updated last year
- Inference of Mamba models in pure C☆187Updated last year
- Fork of llama.cpp, extended for GPT-NeoX, RWKV-v4, and Falcon models☆29Updated last year
- ☆66Updated 2 years ago
- Using OpenAI's Whisper via whisper.cpp with SFML☆14Updated 2 years ago
- Explore training for quantized models☆18Updated last week
- Tiny C++11 GPT-2 inference implementation from scratch☆63Updated 2 weeks ago
- A converter and basic tester for rwkv onnx☆41Updated last year
- Port of Suno AI's Bark in C/C++ for fast inference☆52Updated last year
- Python bindings for ggml☆141Updated 9 months ago
- RWKV in nanoGPT style☆191Updated 11 months ago
- GGUF parser in Python☆27Updated 9 months ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆68Updated last week
- qwen2 and llama3 cpp implementation☆44Updated last year
- asynchronous/distributed speculative evaluation for llama3☆39Updated 9 months ago
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆312Updated last year
- Train your own small bitnet model☆71Updated 7 months ago
- Gpu benchmark☆63Updated 4 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆251Updated 7 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- Universal cross-platform tokenizers binding to HF and sentencepiece☆342Updated last week
- ☆157Updated this week
- ☆71Updated 2 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆110Updated 8 months ago