SwayamInSync / pytorch-cpp-cuda-starterLinks
Setting up Vscode to work with Pytorch in C/C++ with CUDA support
☆25Updated last year
Alternatives and similar repositories for pytorch-cpp-cuda-starter
Users that are interested in pytorch-cpp-cuda-starter are comparing it to the libraries listed below
Sorting:
- ☆120Updated 2 months ago
- ☆46Updated 10 months ago
- A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with cla…☆204Updated 2 months ago
- This repo has all the basic things you'll need in-order to understand complete vision transformer architecture and its various implementa…☆229Updated last year
- Here's all my Python/Numba (CUDA) code for the encoder block I made :)☆71Updated 9 months ago
- Learnings and programs related to CUDA☆432Updated 7 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 10 months ago
- pytorch from scratch in pure C/CUDA and python☆41Updated last year
- ☆45Updated 9 months ago
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated 2 weeks ago
- learning & making kernels in cuda / triton☆22Updated 5 months ago
- Learning about CUDA by writing PTX code.☆152Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆198Updated 8 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆81Updated 8 months ago
- ☆90Updated last month
- Quantized LLM training in pure CUDA/C++.☆235Updated 2 weeks ago
- Andrej Kapathy's micrograd implemented in c☆30Updated last year
- Low memory full parameter finetuning of LLMs☆53Updated 6 months ago
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆287Updated 3 months ago
- qwen3 experiments☆34Updated 7 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 9 months ago
- coding CUDA everyday!☆73Updated this week
- (WIP) A small but powerful, homemade PyTorch from scratch.☆672Updated last week
- working implimention of deepseek MLA☆45Updated last year
- Inference Llama 2 in C++☆43Updated last year
- KernelBench v2: Can LLMs Write GPU Kernels? - Benchmark with Torch -> Triton (and more!) problems☆21Updated 7 months ago
- Lego for GRPO☆30Updated 8 months ago
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆23Updated last year
- GPU Kernels☆220Updated 9 months ago
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)☆53Updated last year