GaoYusong / llm.cpp
A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.
☆24Updated 6 months ago
Alternatives and similar repositories for llm.cpp:
Users that are interested in llm.cpp are comparing it to the libraries listed below
- LLM training in simple, raw C/CUDA☆91Updated 9 months ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆51Updated 2 months ago
- Header-only safetensors loader and saver in C++☆53Updated 2 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆120Updated this week
- A distributed KV store for disaggregated LLM inference☆31Updated this week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆247Updated last month
- High-Performance SGEMM on CUDA devices☆76Updated last month
- Lightweight Llama 3 8B Inference Engine in CUDA C☆45Updated last week
- pytorch from scratch in pure C/CUDA and python☆40Updated 4 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆104Updated 5 months ago
- C++ interfaces for RDMA access☆66Updated last month
- Nsight Compute In Docker☆11Updated last year
- A C++ implementation of a LRU cache☆38Updated 4 years ago
- A quick pool allocator for c++ with type info and gc support☆2Updated 2 years ago
- 小彭老师推出 SyCL 2020 课程(施工中,日后会在直播中放出)☆15Updated last year
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆52Updated 2 weeks ago
- End to End steps for adding custom ops in PyTorch.☆20Updated 4 years ago
- A tool for examining GPU scheduling behavior.☆71Updated 6 months ago
- Fast and memory-efficient exact attention☆44Updated this week
- Task graph-based asynchronous programming system using C++ coroutine☆87Updated last year
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆29Updated last month
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- CUDA/Metal accelerated language model inference☆512Updated 2 months ago
- Common source, scripts and utilities shared across all Triton repositories.☆68Updated last week
- Profiling Taskflow Programs through Visualization☆49Updated last year
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆126Updated 3 months ago
- A GPU-driven system framework for scalable AI applications☆112Updated 2 weeks ago
- A language and compiler for irregular tensor programs.