thomaschlt / CUDAppleLinks
Exploration work on executing CUDA kernels on Apple Silicon (Metal-compatible code).
β35Updated 7 months ago
Alternatives and similar repositories for CUDApple
Users that are interested in CUDApple are comparing it to the libraries listed below
Sorting:
- SIMD quantization kernelsβ94Updated 5 months ago
- Where GPUs get cooked π©βπ³π₯β362Updated 2 weeks ago
- 1.58 Bit LLM on Apple Silicon using MLXβ242Updated last year
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IPβ141Updated 4 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizationsβ81Updated 8 months ago
- β90Updated last month
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.β157Updated 7 months ago
- Tensor library with autograd using only Rust's standard libraryβ71Updated last year
- Simple MPI implementation for prototyping or learningβ300Updated 6 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)β110Updated 11 months ago
- Learnings and programs related to CUDAβ432Updated 7 months ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ114Updated 11 months ago
- π· Build compute kernelsβ214Updated last week
- A minimalistic C++ Jinja templating engine for LLM chat templatesβ203Updated 4 months ago
- The missing tiktoken training codeβ334Updated last month
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 10 months ago
- Learning about CUDA by writing PTX code.β152Updated last year
- moondream in zig.β75Updated 8 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)β61Updated last year
- small auto-grad engine inspired from Karpathy's micrograd and PyTorchβ276Updated last year
- A simple MLX implementation for pretraining LLMs on Apple Silicon.β85Updated 5 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β68Updated this week
- Quantized LLM training in pure CUDA/C++.β235Updated 2 weeks ago
- Official CLI and Python SDK for Prime Intellect - access GPU compute, remote sandboxes, RL environments, and distributed training infrastβ¦β148Updated this week
- An implementation of delta-iris in tinygradβ72Updated last year
- C API for MLXβ172Updated this week
- NanoGPT-speedrunning for the poor T4 enjoyersβ73Updated 9 months ago
- Utils for Unsloth https://github.com/unslothai/unslothβ188Updated last week
- port of Andrjey Karpathy's llm.c to Mojoβ363Updated 6 months ago
- peer-to-peer compute and intelligence network that enables decentralized AI development at scaleβ137Updated 2 months ago