Const-me / CgmlLinks
GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.
☆58Updated last year
Alternatives and similar repositories for Cgml
Users that are interested in Cgml are comparing it to the libraries listed below
Sorting:
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- Tiny Dream - An embedded, Header Only, Stable Diffusion C++ implementation☆265Updated 2 years ago
- A JPEG Image Compression Service using Part Homomorphic Encryption.☆31Updated 8 months ago
- A fork of llama3.c used to do some R&D on inferencing☆22Updated 11 months ago
- A playground to make it easy to try crazy things☆33Updated last month
- C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.☆74Updated 2 years ago
- throwaway GPT inference☆140Updated last year
- Official implementation of "WhisperNER: Unified Open Named Entity and Speech Recognition"☆197Updated 8 months ago
- Code sample showing how to run and benchmark models on Qualcomm's Window PCs☆103Updated last year
- Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).☆254Updated 2 years ago
- Algebraic enhancements for GEMM & AI accelerators☆281Updated 8 months ago
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆124Updated 7 months ago
- Experiments with BitNet inference on CPU☆54Updated last year
- Mistral7B playing DOOM☆138Updated last year
- Heirarchical Navigable Small Worlds☆101Updated 3 months ago
- An implementation of bucketMul LLM inference☆223Updated last year
- Wang Yi's GPT solution☆142Updated last year
- Richard is gaining power☆198Updated 5 months ago
- ☆198Updated 6 months ago
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆49Updated 2 years ago
- ☆190Updated last year
- A BERT that you can train on a (gaming) laptop.☆208Updated 2 years ago
- +256,000,000 points per plot, +60 Fps on shity laptop. Only limit is the size of your RAM.☆157Updated last week
- A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.☆105Updated 2 years ago
- A GPU Accelerated Binary Vector Store☆47Updated 9 months ago
- A live multiplayer trivia game where users can bid for the subject of the next question☆29Updated 7 months ago
- ☆165Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆52Updated 8 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- A graphics engine that executes entirely on the CPU☆224Updated last year