geohot / ctypeslibLinks
Generate python ctypes classes from C headers. Requires LLVM clang
☆13Updated 9 months ago
Alternatives and similar repositories for ctypeslib
Users that are interested in ctypeslib are comparing it to the libraries listed below
Sorting:
- ctypes wrappers for HIP, CUDA, and OpenCL☆129Updated 11 months ago
- FP4 MAC Array☆19Updated last year
- Learning about CUDA by writing PTX code.☆131Updated last year
- pytorch from scratch in pure C/CUDA and python☆40Updated 7 months ago
- This repository contain the simple llama3 implementation in pure jax.☆64Updated 3 months ago
- asynchronous/distributed speculative evaluation for llama3☆38Updated 9 months ago
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆33Updated 10 months ago
- ☆49Updated last year
- Triton implementation of GPT/LLAMA☆18Updated 9 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- ☆12Updated 11 months ago
- The Finite Field Assembly Programming Language☆37Updated 2 weeks ago
- FastFeedForward Networks☆20Updated last year
- RDNA3 emulator☆54Updated last month
- Because it's there.☆16Updated 8 months ago
- LLM training in simple, raw C/Metal Shading Language☆54Updated last year
- Standalone commandline CLI tool for compiling Triton kernels☆18Updated 8 months ago
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- ☆34Updated last week
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆48Updated 3 months ago
- Experiments with BitNet inference on CPU☆55Updated last year
- Minimal C++ implementation of GPT2☆40Updated last year
- Effort to open-source 10.5 trillion parameter Gemini model.☆17Updated last year
- Course Project for COMP4471 on RWKV☆17Updated last year
- ☆30Updated last week
- Custom PTX Instruction Benchmark☆126Updated 3 months ago
- Fork of LLVM with support for downgrading bitcode.☆19Updated this week
- Implementation of GateLoop Transformer in Pytorch and Jax☆88Updated 11 months ago
- ☆46Updated last week