antirez / gte-pure-CLinks
Pure C inference for the GTE Small embedding model
☆97Updated last week
Alternatives and similar repositories for gte-pure-C
Users that are interested in gte-pure-C are comparing it to the libraries listed below
Sorting:
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆143Updated 3 months ago
- Inference RWKV v7 in pure C.☆43Updated 3 months ago
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆42Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆383Updated 3 weeks ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆376Updated 9 months ago
- Tensor library & inference framework for machine learning☆117Updated 3 months ago
- Clover: Quantized 4-bit Linear Algebra Library☆114Updated 7 years ago
- LLM training in simple, raw C/CUDA☆112Updated last year
- Quantized LLM training in pure CUDA/C++.☆233Updated last week
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- Standalone commandline CLI tool for compiling Triton kernels☆20Updated last year
- Learning about CUDA by writing PTX code.☆151Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templates☆202Updated 4 months ago
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆153Updated last year
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆215Updated 2 years ago
- GPEmu, a GPU emulator for faster and cheaper prototyping and evaluation of deep learning system research☆38Updated last year
- pytorch from scratch in pure C/CUDA and python☆40Updated last year
- GGUF implementation in C as a library and a tools CLI program☆301Updated 5 months ago
- Inference of Mamba and Mamba2 models in pure C☆196Updated last week
- throwaway GPT inference☆141Updated last year
- Custom PTX Instruction Benchmark☆138Updated 11 months ago
- SMAZ2: compression for very short messages for LoRa and embedded devices☆107Updated last year
- ☆158Updated 3 weeks ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆144Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆53Updated 10 months ago
- Experiments with BitNet inference on CPU☆55Updated last year
- Super fast FP32 matrix multiplication on RDNA3☆82Updated 10 months ago
- Autograd to GPT-2 completely from scratch☆126Updated 5 months ago
- A fork of llama3.c used to do some R&D on inferencing☆22Updated last year